Installing Cassandra on CentOS 7: A Step-by-Step Guide to Setting Up a Distributed Database

Apache Cassandra is an open-source distributed database management system designed for speed and scalability, adept at handling vast amounts of structured data spread across many servers. Known for its reliability, Cassandra excels with features like clustering, replication, and multi-data center replication, ensuring redundancy, failover, and disaster recovery.

This comprehensive guide will walk you through installing and configuring Cassandra on CentOS 7.

Prerequisites

  • A server running CentOS 7.
  • A non-root user with sudo privileges configured on your server.

Getting Started

Begin by updating your system to the latest available packages using the following command:

sudo yum update -y

Cassandra relies on Java, so it needs to be installed on your system. Use the command below to add Java:

sudo yum install java -y

After installing Java, verify its installation and version with this command:

java -version

You should see a result similar to:

openjdk version "1.8.0_151"
OpenJDK Runtime Environment (build 1.8.0_151-b12)
OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

Installing Cassandra

Cassandra is not part of the default CentOS 7 repository. Therefore, you must add the Apache Software Foundation’s repository to your configuration. Create a file named cassandra.repo within the /etc/yum.repos.d directory using the command below:

sudo nano /etc/yum.repos.d/cassandra.repo

Include the following configuration:

[cassandra]
name = DataStax Repo for Apache Cassandra
baseurl = http://rpm.datastax.com/community
enabled = 1
gpgcheck = 0

Save the file and close it. Following this, update the repository list to include the new repository:

sudo yum update -y

Now, install Cassandra using the following command:

sudo yum install dsc20 -y

Upon installation completion, initialize and configure Cassandra to run at startup with these commands:

sudo systemctl start cassandra
sudo systemctl enable cassandra

To check Cassandra’s service status and ensure it’s running correctly, use the command below:

sudo systemctl status cassandra

You should see an output like:

? cassandra.service - SYSV: Starts and stops Cassandra
   Loaded: loaded (/etc/rc.d/init.d/cassandra; bad; vendor preset: disabled)
   Active: active (exited) since Sun 2017-12-17 17:53:58 IST; 12s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 15323 ExecStart=/etc/rc.d/init.d/cassandra start (code=exited, status=0/SUCCESS)

Dec 17 17:53:55 centOS-7 systemd[1]: Starting SYSV: Starts and stops Cassandra...
Dec 17 17:53:56 centOS-7 su[15332]: (to cassandra) root on none
Dec 17 17:53:58 centOS-7 cassandra[15323]: Starting Cassandra: OK
Dec 17 17:53:58 centOS-7 systemd[1]: Started SYSV: Starts and stops Cassandra.

Connecting and Verifying Cassandra Cluster

With Cassandra running, you can proceed to verify and connect to your Cassandra cluster.

First, check the status of your Cassandra cluster using:

sudo nodetool status

If successful, expect an output like:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns (effective)  Host ID                               Rack
UN  127.0.0.1  65.15 KB   256     100.0%            516af85e-2e6a-454a-b27f-6eacafa6b978  rack1

To connect using the Cassandra Query Language shell, type:

cqlsh

Successful connection will yield:

Connected to Test Cluster at localhost:9160.
[cqlsh 4.1.1 | Cassandra 2.0.17 | CQL spec 3.1.1 | Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> 

To exit the Cassandra command line interface, type:

cqlsh> exit

FAQ

Do I need root privileges to install Cassandra on CentOS 7?

A non-root user with sudo privileges is recommended for installation to maintain best security practices.

What Java version is required for Cassandra?

Cassandra requires Java. Typically, OpenJDK 8 or Oracle JDK is recommended for compatibility.

Does Cassandra installation impact system performance?

As a distributed database designed for high performance, its resource usage depends on data size, cluster size, and configuration. Appropriate hardware and configuration are essential for optimal performance.

How can I ensure Cassandra starts automatically?

By enabling the service with sudo systemctl enable cassandra, Cassandra will start automatically on reboot.