SPLK-2002 Splunk Enterprise Certified Architect – Indexer Clustering

  1. Overview of Indexer Clustering

Hey everyone and welcome back. In today’s video we will be discussing at a high level overview about the indexer clustering. Now, in the previous section we were discussing about indexer as a component. Now, one important part to remember is that till now we have been doing all the activities within a single Splunk instance. And let’s say whatever data you have in your Splunk instance and your Splunk instance goes down, then typically what would happen is your data would either be lost or it would be inaccessible till you bring back your Splunk. So this is a reason why for corporate environment it is very important to ensure that a high availability is present and this can be achieved with the help of indexer cluster. So within your Splunk Enterprise instance if you go to settings and if you click on indexer clustering you see you have an option for enabling indexer clustering.

So by default you do not have this enable. So if you click on enable indexer clustering there are three options that you will be presented with. One is the master node, second is the peer node and third is the search head node. So let’s understand what these mean. Now, master node basically coordinates the activities of peer node and peer node are basically the nodes which contains the actual data and it performs the replication related activities or any activities which master node tells the peer node to do. Now, we can understand this with a simple indexer cluster architecture where you have one master node here and you have two peer nodes which are present.

Now, depending upon the data size that you might have for your organization, the amount of peer nodes that you would need typically would vary. Now, currently you have two peer nodes here, but there are organizations who have ten or even 20 index or peer nodes. So when you have a larger index or peer node, let’s say you have ten index or peer nodes, there needs to be some leader who can inform about the configurations, about the settings to all of the peer nodes, and that leader who dictates the configuration, the search factor related, the replication factor related and various other configuration settings. So that is called as the master indexer. So master indexer is responsible to inform each and every peer node about the configuration set that they should be other into. And this is the reason why typically you would be asked whether this node that you are configuring, whether you want it to be a master node or you want it to be a peer node.

Now, search head node is a different aspect altogether which is basically used for searching. Now, when it comes to pure index or clustering, you would need a master node and you would need a peer node. These are the only two aspects that you will be working it whenever you are working on indexer clustering. Now, one more important part that I would like to share before we conclude this video is that what master indexer or the master node does not take part in the replication by itself? So let’s say you add a data to indexer node one, it will get replicated to indexer node two, or it can even get replicated to other indexer node which are present within the cluster, but it will not get replicated to the master indexer here. So master indexer is solely responsible for informing or for dictating the peer nodes about the configuration settings that it should be othering to master node by itself does not take part in the replication of data by itself. So this is one important aspect to remember.

So this is it about today’s video. What we’ll be doing in the upcoming video is we’ll be designing the infrastructure which will be needed for master indexer and for two peer nodes. And once we have the infrastructure ready, we can go ahead and perform the configuration aspect for the indexer cluster.

  1. Deploying Infrastructure for Indexer Cluster

Hey everyone and welcome back. Now, in today’s video, we will be looking into how we can deploy the infrastructure which is needed for this simple indexer cluster architecture that we had designed and we discussed in the earlier video. Now, basically, in order to implement this, we need a three servers or it can be three docker containers as well. So for our demos, we’ll be going with a docker container because that is the easier approach. But in case you do not want to go with docker container approach, you can launch three servers, maybe in AWS, and you can launch Splunk in each of the three servers. So each one of the servers will act as a component here. Now, I have already created a sample document here associated with three docker containers.

Now, the first docker run command is associated with the master indexer node. So the first thing that we’ll be doing is we’ll be creating a docker container associated with the master index area. So the name that we’ll be giving to this docker container is Splunk Hyphen MITx One. And the host name will also be the same, which is Splunk Hyphen MITx One. Now, we’ll be putting port 8000 and we’ll be connecting it to the port 8000 of the docker container. So this 8000 is of my host, which is Windows, and this 8000 is of my docker container. So there is a mapping between the host port 8000 to the docker container port 8000. So this is the first command. Pretty simple. I’m sure you already are familiar with this. We’ll run this in the CLI. So when I execute this now, if you do a docker PS, you will see that I have one docker container with the name Splunk Hyphen midx One.

Midx one basically means master indexer one. So it is easy to distinguish. So this is the master indexer and this is the first server of the master indexer. Now, once you have the master indexer up, according to the architecture we have, you need to launch two more nodes. One will be for the indexer node one and second would be the indexer node two. And these two docker and commands are precisely for that. So now, in the second one, you see the naming convention that I have given is Splunk Hyphen IDX zero one. IDX stands for indexer and zero one is basically the the first indexer. The host name is the same. Now, one different thing that you will see.

Now we are doing a port mapping of 8001 to 8000. Now, the reason why we are doing 8001 here is because in the first container, our port 8000 of my Windows machine is already binded and hence we will be binding port 8001 for the second container. However, the docker container, the back end docker container will always run on 8000. It is just that my Windows host 8001 port will be mapped to the back end 8000 port of the docker container. So I copy this up and let’s execute this. And now if you do docker PS, now you have the second docker container, which is a Splunk IDX one. So till now we have created master indexer one. Then the index is zero one.

And now you need to create an index of two. Now, index two, the naming convention as you know, it’s Splunk Hyphen IDX two. The port this time would be port 8002 of my Windows machine. The docker container port would be 8000. So this is the only change. I’ll copy this up. I’ll just do a CLS to clear the screen. I’ll paste it here, I’ll press Enter. And now if you do a docker PS, now there are three docker containers which are present. One is splunk hyphen midx one. You have splunk hyphen IDX one. You have splunk hyphen IDX two. So let’s quickly verify whether our Splunk is really up in all of these containers. So the first thing that we’ll do is I’ll say Local host, port 8000 and our Splunk container seems to be working fine. Now let’s do local host. This time we’ll be doing port 8001. And now you see it has begun to load. Now, one important aspect to remember over here is that if you’re running a lot of docker containers, definitely the amount of resources that will be used will be much more higher. So always look into your task manager and now you see I have good amount of CPU which is getting used over here. So just make sure that you have good amount of CPU and good amount of memory. Now, since I am also doing a recording, it generally takes good amount of CPU and hence it will be a little slow from my end, but you can try it out.

In case you do not have a very fast desktop, you can instead create three servers and you can install docker in individual servers and you can run these there. So just for a FII, perfect. So now I have in local Host 8001, I have my Splunk, which is up and running. So now the last part is Local Host 8002 and perfect. So we also have Local Host 8002 here. So you have port 8000. If you remember, this is for the Master indexer. You have 8001. This is peer indexer zero one. You have 8002, which is the peer indexer two. So this is the basic configuration that we have created.

Now, again, just to ensure that many of the people who are viewing this video, they do not have a very fast laptop. So in case you do not have a very fast laptop, do wait for some amount of time. The docker Splunk will get configured. It might take five minutes sometimes for this to get work provided. If you’re running multiple docker containers. Again, in case if this setup does not work, then you can launch three servers maybe in AWS and you can install Splunk in each of these three servers and then we can continue with the setup. So this is the high level overview about the deployment of infrastructure which is needed for the indexer cluster. In the upcoming video we’ll go ahead and we’ll be ah, doing the practicals and we’ll be configuring the master nodes as well as the peer nodes.

img