SPLK-2002 Splunk Enterprise Certified Architect – Post Installation Activities Part 2

  1. Importance of Source Types

Hey everyone and welcome back. In today’s video we will be discussing about some of the importance with the source type field that you generally set within your logs. Now, one important part to remember is that in Splunk, field extractions are generally defined at a source type level. This is one important part to remember unless you are putting something like rejects on the SPL command. At a global level, they can be defined at the props conve as well as transforms conife file.

Now, if your source type for your log file is incorrect, then it will not get passed properly and Splunk comes with certain built in source type and associated logs, or I would say associated reggaex for common logs that you see like Apache and similar. So let’s directly jump to practical to understand this in a much better way. So I am in my search and reporting app. Let’s go here and let’s filter by source type is equal to access combined. So during the earlier section we had imported some sample logs of type Apache.

So it had a source type of access underscore Combined. Now, if you will see over here, all of these logs are extracted properly. So now you have a proper field name on the left and you have the associated value on the right. That means that the field extractions has been done properly. Now let’s do one thing, let’s import the same logs. However, this time we’ll put it in a different source time. So let’s do that quickly. So I’ll go to add data and we’ll upload the relevant access log file. So I have my access log file close to around four MB. Sounds good enough. I’ll upload that and in the next screen basically you will have to select a source type here. Now it automatically detected a source type which is access underscore combined.

But what I intended to show is what happens if you select a not correct source type. So currently if you go into the source type you will see that a lot of source types are divided into certain categories. Like database has source type associated with MySQL slow query with the MySQL redeem and you have email specific source type, application specific source types and various others. So let’s do one thing, let’s go to settings, we’ll go to source types and we’ll be creating our own source type here. So on the left hand side you see there are a lot of source types which are present. So these are the default source type which generally comes with Splunk. As we have already discussed, Splunk does come with basic source types for common log files that you will find in most of the organizations. So we’ll create a new source type. Let’s name this as KP Labs underscore demo. All right?

And even Brick. Let’s set it at every line and we’ll click on Save. So once we do that, we’ll have to refresh the page basically it will not appear directly. So I’ll just quickly re upload the access log file and this time we’ll go into the Custom and within custom category you have a Kplabs underscore Demo. So this is the source type that we are giving. I’ll go to next and host value. You can just give it whatever you intend to. Let’s put it in a separate index. I’ll say the index name would be Demo source type and I’ll click on Save. Go to review and click on Submit.

So once you have a file uploaded successfully, let’s click on Start Searching and it will select the index that we just created. All right, so this is the index that we just created. Now, if you try to open this up, you see not everything gets extracted over here. So when you compare it with the Access Combined source type, you will see that very few things are being extracted over here and this is not what is intended. Basically what you want is that each and every item here should be extracted. And since you do not have it in a right source type, the Splunk will not extract that. Now coming back to the topic of the Access Combined source type that we were speaking about, let’s do that once again. So I’ll say source type is equal to access combined.

The timeline I’ll give it as all time and if I just open this up, you will see each and every field is extracted and every field also has the appropriate field in which is easy to understand. Now, all of these things are basically it comes from props conve as well as Transforms conve. So let’s find this out and look into where exactly those aspects would be stored. So one thing is you can directly go to the CLI as well, but let’s start with the high level and click on Access underscore Combined and you go to advance you see that it has certain configurations which are being set including there is a file or a value call as Access Hyphen Extractions. So if you quickly go to CLI here. Now I’m in my directory of Opt, Splunk etc system default and let’s go to Transforms Connect and basically within this file you should see a field call as Access Hyphen Extraction.

So let’s copy this up and I’ll do a search. Let’s go bit down and you will see that there is an appropriate Regix for that. Now, this reggaex is specifically called if the source type is access underscore Combined so only then this specific regex will be called. However, if you give a source type which is not Access Combined like we did, like Kplab’s Demo, then all of these configuration settings that we have, they will not work and hence your data will not get extracted. And this is the reason why it is important to have a right source type defined. Whenever you upload a log file.

  1. Interactive Field Extractor (IFX)

Hey everyone and welcome back. In today’s video we will be discussing about Splunk interactive field extractor which is also referred as the IFX. Now typically whenever you want to extract fields within your log data regex writing is the idle way. However, many times if you want to quickly wrap up things or if the log file has a simple pattern, then the interactive field extractor can do things quickly for you. Now, interactive field extractor basically allows us to teach Splunk on how to extract fields from your data without really writing your reg X. So in the earlier few videos we have been writing reg X and we were looking into how Splunk is actually extracting the data based on the reg X. However, in today’s video we’ll look into how we can make things work without really writing those complex rigs. Now, if you remember in the earlier video we had basically created an index call as demo source type and we also had uploaded a sample access log, the Apache sample access log. Now, if you see over here, not every field is extracted over here. So for example, if you see over here, the first field is the IP address. Now within here you cannot really see any IP address portion being extracted. Now the reason for this is because first it is not in the right source type where you have a proper regular expression. And second thing is you do not have a regks for this specific source type.

So let’s look into how we can make use of interactive field extractor to at least have a better field extraction than what we have right now. So in order to do that, what you need to do is you have to click on Extract new fields and this will bring up a new page. Now within this new page you have to select a sample log file. So I’ll just click on one sample log and it has brought that sample log and you can click on Next. On the next you have two options. One is either you can write a regular expression that can parse the field for you or you can do it based on Delameter. So we’ll do the delimiter with and we’ll select Next. Now, in the ringing field you see there are certain delimiters that you can use. One is space, comma, tab, pipe and others. So let’s select space over here. And as soon as I selected space, you see how well the log file got extracted.

So now you have the IP address, you have field two, field three, you have the timestamp, you have the request. Basically this is the Uri, it contains the request Uri, you have the response code, you have the bytes and various others. So from here you can even put a field name. So let’s put source underscore IP. I’ll rename this field, I’ll rename the fourth field as timestamp, I’ll rename the field file. Let’s say as request code, the field six I’ll say as response code and that’s about it. This is the only thing that we’ll do as of now. Now, if you go a little bit down, you see I have source IP here and it has extracted the source IP. You have times that the reason why you have a hyphen here is because within the log file you don’t really have any value associated here.

So you have source IP, you have timestamp, you have request code. So this is something that we were setting up here. So what you can do now is you can click on next again in case if you want to rename each and every field, you can do that. But just for our demo, we are just renaming certain fields. So I’ll go ahead. I’ll do a next. Now you have to give an extraction name over here. So let me say as demo data and the permissions I’ll just give it as all apps and I’ll click on Finish. So once you have click on Finish, you can go back to the surgeon reporting app and you can put an index is equal to demo source type. Perfect. So now that we have our data loaded, now if you try to open this up, you see now much more the field extraction is much more better.

So you have SRC underscore IP, you have 182, you have response code, you have certain fields that you had not named it earlier. But this is something that you can always do. Now, if you quickly want to see the list of IP address, you can do a stats count by SRC underscore IP. And now you see you have the list of IP addresses which are part of the combined law. So one thing that I wanted to share is not all the time writing regx would make sense. So typically in this example, we saw how quickly we were able to extract the relevant fields with the interactive field extractor and it is much more faster. And if you quickly want to analyze some data which has a simple kind of a format and IFX makes sense, it is much more faster. However, on the production side, it’s always recommended to use a regular expression.

  1. props.conf and transforms.conf

Hey everyone and welcome back. So before we begin the video, I would just like to share that today is Devali here in India. So happy, Devali, to all of the viewers who are watching now, you might hear a bit of music in the background. This is primarily because it’s running everywhere. So I have tried my best to isolate things, but you might hear a little background music. So anyways, you can ignore that. So today’s topic is about props and transforms convey. So during your projects and assignments of Splunk, props and transforms conve is something that you will be working quite often on a regular basis so let’s go ahead and understand both of them. Now generally the default installation of Splunk it comes with certain default sort types and it’s associated field extractions for various common log files like it can be Catalina, it can be access Combine and others.

So we do have flexibility to create our own source type as well so generally whenever we create our own source type we have to put its associated configuration settings and this associated configuration settings are generally put in props conve and transforms conve file. So let’s understand this with a simple example so that it will make sense. So I’m in my Splunk and if you go to settings on the right hand side there is a field of source types.

Now if you will see on the left hand side there are various source types which are available so these source types are the one which comes with Splunk by default. So let’s open up the access underscore combined source type and it has certain settings like if you see event breaks is every line then if you go into timestamp then the timestamp prefix so this is the prefix that you have for timestamp. So if you look into the Access Combined log file, you will understand what this is. And if you go into the advanced settings you will see that this specific source type of access underscore Combined has various configuration settings associated with it. One is car. Set is utf.

Eight you have report Hyphen access which is access hyphen extractions, you have time prefix, you have category, you have no binary check is true and you can add various other configuration settings which are required for your source type. So let’s understand on how exactly this might work. So we’ll take this specific setting which is report iPhone access and the value is access hyphen extraction. So let’s look into what exactly this specific configuration setting does. So I am in my splunk so let’s go to opt Splunk etc system and since we know that there are certain default source types which comes prebid we need to go into the default directory. Now within the default directory there are two files one if you will see here one is props conf and second is transforms conve.

So let’s open up props conf and see on what are the aspects that it contains. So we’ll go a bit down and if you will see, every source type is associated with the source type name and they have certain configuration settings which are associated. So we’ll be looking into Access underscore combined and we’ll look into one more for ease of understanding. So let’s go a bit down. Perfect. So this is the access underscore combined source type. Now, typically, whatever configurations that you would generally give here, you can do anything via GUI as well as CLI, not anything via GUI, but everything we are CLI or through configuration files which is the feature supported by Splunk.

And generally whatever configuration settings that you give here, they get stored in the file call as Props Conf. And if you will see you have report hyphen access is equal to access hyphen extractions. So what this basically means is that since this is a source type and we know that every source type needs to have certain regks, so that the file that you import, it needs to be extracted.

So that regex is defined in Access underscore Extractions field which is present in the transforms convey. So this is what it basically means. So now, if you open up the transforms conifer within the access restrictions let’s go a bit down, you will see that there is an Access Hyphen Extractions and it has the reggae associated with the field type which is Access underscore Combined Source Type. So all of those regks can be defined within the transforms conve file.

Now, one important part to remember is that you can define the regular expressions in Props conve as well. But if you do a transforms convey, then it can be used by multiple other props convey aspects as well. So this is considered as a best practice where what you do is you say report. So this is like a function report Hyphen Access and you specify the global parameter name which is Access Hyphen Extractions which will have the actual regular expression which will be used to extract the access underscore Combined Source Type. Great. So now let’s do one thing, let’s clone this. We’ll do a clone here and I’ll say Access underscore combine underscore Test. Now, everything is a clone here.

Now, one thing that we’ll do is access Hyphen extractions. We’ll change it to access Hyphen extractions. All right, we’ll go ahead and we’ll save this. Perfect. So now you have access underscore combined underscore test. Now, since this is a modified value, it will not be present within the default. We have to go to local and if I do a LS, you will see that there is a new file called as Props Conf. Now, if you open up the Props conve here, you will see the configuration settings remains the same. One thing that we edited over here is Report Hyphen Access. It has access. Extractions test. Great. So this is the basics. Now, if you remember, we do not have a transforms conve associated with it. So let’s do one thing, let’s go ahead and import some data and look into whether it gets extracted or not.

So I’ll add the access log file and this time it automatically detects the source type. However, we want to have a custom source type. Now the source type which we created since we cloned it, it should be under the category of web and if you will see here access underscore combined underscore test. Now we’ll go ahead and click on next index. Let’s just put it in the default index for now and I’ll go ahead and I’ll click on Submit.

So now when you do a start searching you will see not every field has been extracted over here. So one sample is let’s take the first field which is the IP address and you’ll see the IP address is not extracted. The reason why it is not extracted is because we have basically defined report which is like a function hyphen access and we had said that this is the parameter name which will have the regular expectations but we do not really have this parameter name defined in the transforms convey at all and this is the reason why it is failing. So, in order to have the field extracted, we need to create a transforms conifer and it should have this global parameter which would have its associated regular expressions. So let’s do one thing, let’s do a cat and we’ll open up default and we’ll see the transforms conve. Now, within transforms conve the value that we need right now is the regular expressions. So this is the access hyphen extractions parameter.

Let’s copy this entirely and what I’ll do is within the local I’ll say nano, transforms corner and basically I’ll just paste everything here. So this basically it includes the regular expressions and we have to change this to access hyphen extractions test. So now let’s quickly verify whether it was hyphen test and it was hyphen test. Great. So this is basically what is needed. Now if you just press search again and you try to open up the feed, you see now the fields are much more better. So you have the client IP address, you have item ID, the product request, et cetera which has been coming in perfectly as we had expected. So this is what the props and transforms what conife is all about. However, in order to understand better, let’s take one more example and look into one more function which is within the default source type.

So let’s go to the source type and let’s look into MySQL this time. So there are a lot of source type, one is MySQL D underscore error. So let me open this up and if you go into advance again, it has certain configuration parameters. Now, one configuration parameter that we can take is break only before. Now, this is an interesting configuration parameter so let’s look into how exactly this might work like so I have a sample MySQL underscore error file.

And if you’ll see over here, this is the format over here. Now, basically, when you push this format to Splunk, splunk should know which is an individual event. So the first line is of one individual event. Then you have the second individual event here, which is spanning across two lines. Then you have the third individual event here and you have the fourth individual event here. Now, how Splunk will know that there are four events. So there should be a way in which Splunk can identify that this two line is a single individual event. And the way Splunk does is through the parameter of break only before. And it has a regular expression which has digit. And how many digits are there? There are six digits.

So if you will see in every MySQL error log, it starts with a six digit number here. And this six digit number is something which your configuration will look into and it will break. So as soon as it finds a six digit number, it will consider the entire line which starts with that six digit number as an individual event. So even though if you have a phi six line of such events here, it will not matter. Splunk will be able to identify that the phi six line is a single event here.

So that is what this is all about. Now, in order to try it out, let’s try it out because nothing really sinks in unless and until we really do practical. That is the best way to learn. So what I’ll do, let me do a clone here. I’ll say MySQL D underscore error underscore test, all right? And this time within the advance, we’ll remove this parameter break only before we can remove this parameter or we cannot remove this parameter or let’s do anything. Let’s go to the easy way right now. Let’s go to add data. We’ll just find the easy way to do things. This is something that you can write out in the practical aspect. So I’ll upload the data here and this time I have selected MySQL underscore error TXT file.

Now, if I do a next year, basically what Splunk will do is it will not be able to identify this. So this is the reason why you will see it has taken all that data within the MySQL error TXT file as a single event here. Now, the reason why is because Splunk has not been able to identify the source type. And since the source type of MySQL D underscore error had a specific configuration which says that it needs to look for this regex and it has to break even when it finds this reg x as the start of the line. So this is the reason why it is not able to break events right now.

But as soon as you configure the right configuration, let’s find the database one. I’ll go to database this time let’s select MySQL D underscore error. Now, you see the events are breaked accordingly. So this is the importance of break underscore only underscore before. And this is the Reg X. Now, similar to this, every configuration that you see over here has its appropriate importance. So in case if you want to understand what each one of this is, let’s take break only before and basically within the Props. com F documentation of Splunk.

So this is how the documentation looks like. Now, what they have done is they have defined everything within this file. So what you can do is you can do a control F and you can basically specify which one you are interested to learn. So I say break only before, let’s go down, and you say it says when set, Splunk software creates a new event only if it encounters a new line that matches the regular expressions. So we already tested it on how exactly it looks like. So anytime if you come across certain configuration parameters which you are not understanding, you can go to Props, convert documentation, do a controller, search for that specific key, and you will have the documentation within the tabular box itself.

img