Kafka Streams is a client library providing organizations with a particularly efficient framework for processing streaming data. It offers a streamlined method for creating applications and microservices that must process data in real-time to be effective. Using the Streams API within Apache Kafka, the solution fundamentally transforms input Kafka topics into output Kafka topics. The benefits are important: Kafka Streams pairs the ease of utilizing standard Java and Scala application code on the client end with the strength of Kafka’s robust server-side cluster architecture.

Kafka Streams Advantages

Kafka’s cluster architecture makes it a fault-tolerant, highly-scalable, and especially elastic solution-able to handle hundreds of…


Unlocking the full potential of PostgreSQL JSON
Unlocking the full potential of PostgreSQL JSON

1. What Is JSON?

JSON was designed to be an open, lightweight text-based data-interchange format for web browsers to and from backend servers. It works natively with JavaScript (hence the name “JavaScript Object Notation”, rather than “J’Son, the only son of the previous Emperor of Spartax” i.e. J’Son, a Marvel character), and as of 2017 is an IETF standard ( 8259).

JSON has a very simple basic structure, but it can be nested, so you may end up with a more complex structure!


Getting to Know ELK Stack: Elasticsearch, Logstash and Kibana
Getting to Know ELK Stack: Elasticsearch, Logstash and Kibana

What Is the ELK Stack and Where Does Logstash Fit?

Logstash is a tool designed to aggregate, filter, and process logs and events. Logstash can take a variety of inputs from different locations, parse the data in different ways, and output to different sources. One of the more powerful destinations for Logstash is Elasticsearch, where the logs can be indexed and searched. Aside from the fast searchability, once the data is available in Elasticsearch it can easily be visualized using Kibana.

Kibana can be used to create visual dashboards so that you can make your data work for you. You can see trends, explore data visually, and create meaningful graphs…


Apache Kafka Architecture — A Complete Guide
Apache Kafka Architecture — A Complete Guide

Apache Kafka is a distributed streaming platform with plenty to offer — from redundant storage of massive data volumes, to a message bus capable of throughput reaching millions of messages each second. These capabilities and more make Kafka a solution that’s tailor-made for processing streaming data from real-time applications.

Despite its name’s suggestion of Kafkaesque complexity, Apache Kafka’s architecture actually delivers an easier to understand approach to application messaging than many of the alternatives. Kafka is essentially a commit log with a very simplistic data structure. It just happens to be an exceptionally fault-tolerant and horizontally scalable one.

The Kafka…


Cassandra Data Modeling Guide to Best Practices
Cassandra Data Modeling Guide to Best Practices

Apache Cassandra is an open source non-relational, or NoSQL, distributed database that enables continuous availability, tremendous scale, and data distribution across multiple data centers and cloud availability zones. Simply put, it provides a highly reliable data storage engine for applications requiring immense scale.

Data modeling is a process used to analyze, organize, and understand the data requirements for a product or service. Data modeling creates the structure your data will live in. It defines how things are labeled and organized, and determines how your data can and will be used. The process of data modeling is similar to designing a…


Introduction to Cassandra Monitoring

Cassandra Monitoring
Cassandra Monitoring

Apache Cassandra is a NoSQL database designed to provide scalability, reliability, and availability with linear performance scaling. Cassandra database is designed as a distributed system, and aims to handle big data efficiently. Refer to what-is-apache-cassandra and cassandra-architecture for more information. Note that knowledge of Cassandra architecture and basic terminology is a prerequisite to understanding Cassandra monitoring.

Cassandra monitoring is an essential area of database operations to ensure the good health of a cluster and optimal performance. Alerting is another crucial area for production systems, and it is complementary to monitoring. Good alerting in Cassandra can be achieved by utilization of…


In the third blog of “Around the World ” series focussing on globally distributed storage, streaming, and search, we build a Stock Broker Application.

1. Place Your Bets!

London Stock Exchange 1800's
London Stock Exchange 1800's

London Stock Exchange 1800’s

How did Phileas Fogg make his fortune? Around the World in Eighty Days describes Phileas Fogg in this way:

Was Phileas Fogg rich? Undoubtedly. But those who knew him best could not imagine how he had made his fortune, and Mr. Fogg was the last person to whom to apply for the information.

I wondered if he had made his fortune on the Stock Market, until I read this:

Certainly an Englishman…


The Cassandra architecture components you need to be aware of to derive unmatchable ROI from a database designed for scalability, availability & reliability

Comprehensive guide to Apache Cassandra Architecture
Comprehensive guide to Apache Cassandra Architecture

The Apache Cassandra architecture is designed to provide scalability, availability, and reliability to store massive amounts of data. If you are new to Cassandra, we recommend going through the high-level concepts covered in what is Cassandra before diving into the architecture.

This blog post aims to cover all the architecture components of Cassandra. After reading the post, you will have a basic understanding of the components. This can be used as a basis to learn about the Cassandra Data Model, to design your own Cassandra cluster, or simply for Cassandra knowledge.

Cluster Topology and Design

Cassandra is based on distributed system architecture. In its…


Instaclustr is pleased to announce the general availability of Elasticsearch Managed Service on the Instaclustr Managed Platform which uniquely places us as the leading Managed Service partner to enterprise businesses for three leading data layer technologies, Apache Cassandra, Apache Kafka, and now, Elasticsearch:

  • all highly available and massively scalable technologies that address your data layer requirements at blazing speeds.
  • all truly open source technologies (with Apache 2.0 license), ensuring businesses aren’t stuck with escalating licence fees simply because of vendor lock-in for niche functionality.

Our Elasticsearch service is based on the Open Distro for Elasticsearch, a 100% open source distribution…


Apache Cassandra 4.0 brings about a long-awaited feature for tracking and logging database user activity. Primarily aimed at providing a robust set of audit capabilities allowing operators of Cassandra to meet external compliance obligations, it brings yet another enterprise feature into the database. Combining work for the full query log capability, the audit log capability provides operators with the ability to audit all DML DDL and DCL changes to either a binary file or a user configurable source (including the new Diagnostics notification changes).

This capability will go a long way toward helping Cassandra operators meet their SOX and PCI…

Instaclustr

Managed platform for open source technologies including Apache Cassandra, Apache Kafka, Apache ZooKeepere, Redis, Elasticsearch and PostgreSQL

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store