SPLK-2002 Splunk Enterprise Certified Architect – Indexer Clustering Part 2

  1. Master Indexer

Hey everyone and welcome back. In today’s video we will be configuring the first component of the indexer cluster, which is the master indexer. Now, we have already seen the indexer cluster architecture. So the first thing that we need to configure is the master indexer. Now once this is configured, then we can go ahead and configure the peer index node one and the peer indexer node two. Now, speaking about the a master index, so there are three important pointers that we need to remember. First is a cluster has only and only one master node. So this is one important part to remember at the time there can only be one master node. Second is master node is responsible for coordinating the activities between the peer nodes within the cluster. And third important pointer is the master node by itself does not store or replicate the data.

So these are the three important pointers with respect to the master indexer node. So with this, let’s open up the splunk. Now, you see this is the master indexer node which is running on port 8000. So let’s quickly log in here. So within my splunk enterprise, I will go to settings, I’ll go to indexer clustering and we’ll enable the indexer clustering. Now this thing we’ll be selecting the master node and we’ll click on next. Now, there are four important settings here. One is the replication factor, second is search factor, third is security key and fourth is cluster label. So among this the replication and search factor is the two most important configuration setting within the indexer clustering that we need to set. So let’s go ahead and understand what replication factor and search factor means.

Now, the replication factor basically determines how many copy of data your cluster will maintain. Now, this specific configuration setting is a key factor in determining the cluster’s fault tolerance capability. So if we take a simple example over here, let’s say that if we want to ensure that the system or your cluster can handle the failure of up to two peer nodes, then we must configure the replication factor of three, which means cluster will store three identical copies of data on a separate node. So currently, if you will see there are three peer nodes over here and we have the replication factor of three. That means one copy of data will be stored in each of the peer nodes.

That basically means that there will be total of three identical copies of data on separate nodes. So here, even if the node one and node two goes down, you still have one identical copy within the node three and your data will be searchable and your data will be impact. And hence replication factor proves to be one of the very important configuration settings when you configure your index or clustering. The second important factor here is the search factor.

Now, search factor basically determines the number of immediately searchable copies of data that the cluster maintains. Now, there are two important concepts. One is searchable copy and second is non searchable copy. Now, if you typically remember during the video of indexes where we were discussing about the bucket lifecycle, in the bucket lifecycle there was something called as frozen bucket. And within the frozen bucket, all you had was the compressed file. You did not really have the index there.

So a non searchable copy is basically a plain raw data without any index file. All right. So this is what a non searchable copy means. Searchable copy basically means it contains the raw data and it also contains extensive index files so that the searching becomes much more faster and scalable. So that’s the difference between searchable copy and non searchable copy. So now that we understood the replication factor and the search factor, we can go ahead and configure both of them. So let’s do one thing, the replication factor, since we are following this architecture where we have two nodes and what we want is that if one node goes down, we also want the second node. So basically we want two identical copies of data to be present.

So we’ll put the replication factor of two. Now, a search factor is basically we already discussed. One important pointer to remember within the search factor is that non searchable copy of data will utilize lesser disk space when compared with the searchable copy of data. This is important part to remember. So you can configure the search factor. I’ll keep it as two. The next configuration is the security key. Now, security key is basically useful for authenticating the communication that happens between master node like communication that happens between master node peer nodes as well as when the search head connects. So this is the security key. What we’ll do is we’ll configure the security key. I’ll configure security key. So this is like a password you can configure. So for my case, I’ll just say password here. And the last factor here is the cluster label.

So this is just to identify the cluster. I’ll say KP labs underscore indexer underscore cluster. So these are the configuration settings for master node. We’ll go ahead and enable the master node. Once you do this, you will have to restart your splunk instance will do a restart now. Perfect. So our restart is now successful. So let me quickly log in and this time now you are under the indexer clustering. So this is the master node and currently you just have one tab here. The tab name is plunk hyphen midx one. So this is the host name of our DX one server. And you do not really have any peer node. So once you configure peer node, you will see much more better details here.

Now, before we conclude, I would just like to show you on how exactly it might look like in the CLI. So if you do a docker exact it splunk hyphen midx one, we’ll do a bash. Now within this we’ll go to opt splunk etc system local and within this there is a file called server con F. So let’s quickly open this and beneath you will see there is a stanza of clustering. Now within this there are certain configuration parameter which have been set.

One is the cluster underscore label, this is kplat’s underscore index or underscore cluster. Then you have the mode. The mode is master. You have the pass for sin key. So this is basically the password that we had set and you have the replication factor of two. So all of the configuration that we basically did in GI, it gets stored within the server configuration file. So that’s about it for the master node configuration. In the upcoming video we’ll go ahead and we’ll configure the peer node and we’ll look into how exactly the peer node gets attached within the master node console.

  1. Peer Indexers

Hey everyone and welcome back. Now in the earlier video we have configured the master indexer node so in today’s video we’ll go ahead and configure the peer indexer nodes. Now, a configuration of peer index node is much more easier. You don’t really need to understand various aspects like replication factor, such factor because all of those configuration settings master indexer will be put pushing it to the peer node. So all you have to do is you have to establish the connectivity between the peer node to the master index and that’s about it and hence the configuration is much more simpler. So currently I am in local host colon 8001 so this is our first indexer node which is going to be the peer indexer node so the configuration would be much more similar.

So we’ll go to settings, we’ll go to indexer clustering and we’ll enable the indexer clustering. Now, since we have already configured the master node this time we’ll be configuring the peer node. So this is going to be our first peer node that we will be configuring for. Now, if I do a next, it is basically asking me for three aspects. One is the master Uri, second is the peer replication port and third is the security key. Now the master Uri is basically the host name or the IP address of the master indexer. So when you create an index or node and you want to enable it for the clustering, you have to connect it to the master indexer and hence you have to specify the IP address or the host name within the master Uri block. So in order to do that, what we can do is let’s quickly do a docker PS and this is our master indexer docker container.

Now, if you want to find the IP address of the container, you can do a docker inspect and you can say splunk MITx one. And beneath you will see that it has the IP address of 172 dot 17 dot zero, dot two. So I’ll specify 172 dot 17 dot zero, dot two colon 80 89. So within the example also you will see you have the Https, you have the IP address followed by the port, which is 80 89, which is the management port. The next configuration is the peer replication port. So basically what would happen is that there are two peers over here. Now, both of the peers will be communicating or I would say they will be replicating the data. So what would be the replicating port? So I’ll say replicating port would be 80 80. You can define any port that you intend to do, make sure that it does not conflict with the other ports. So I’ll say the peer replication port is 80 80 and the security key.

Now within the master indexer we had configured the security key. Now this security key is very important for the authentication to take place and because a peer replication will be connected to the master indexer, it would need the authentication key. So our authentication key would be the same that we had configured while creating the master indexer. So in my case it is password. I’ll go ahead and I’ll enable the peer node. Now, once the peer node is enabled, you can go ahead and you can do a restart now. Perfect. So now the restart is successful. Now, do remember that it automatically redirected to port 8000. So just remember that you have three Splunk instances. 80 00 80 01 80 02 so now we are logging into the port 8000, which is the master indexer. And currently, once you connected the peer node one, now you see, you have some data is not searchable, but you do have one peers, which is plunk hyphen IDX one.

The status is up and the amount of bucket is three. So this is the first peer that we have connected. So this peer is now connected to the master indexer. Now we need to do the same process for the peer indexer node zero two. So what I have done is I have opened the node in a different browser altogether. So this node is the peer indexer node zero two. You see it’s listening on 8002. Here we’ll do the same settings. We’ll go to indexer clustering, we’ll enable indexer clustering. This will be the peer node. I’ll specify the IP address of the master node. 170, 2170, 280, 89 the peer replication port. Make sure that the peer replication port is same as the one that you had set in the peer node zero one and the security key will be the same that you had set.

I’ll go ahead and I’ll enable the peer node again. Once you have done this, you will have to do a quick restart. So once the restart is successful now, if you go to the indexer which is the master indexer and refresh. Now you see, you have Splunk IDX One. And you have Splunk IDX two. Now you see, it went from data is not searchable to all the data is searchable.

And now you see, we already discussed that the replication factor was two. And now the amount of buckets that you have is four in the IDX one and you have four in IDX two. So you have the same identical copies of data across both the peer nodes. Now, if you go into the indexes, you will see that you have an internal index. Now both of them is green. So that basically means that the data is across both the peer nodes. And even if one peer node goes down, you will still retain your data per se. So this is about configuring the peer nodes with the master indexer. I hope this video has been useful for you and I look forward to seeing you in the next video.

img