PL-300 – Section 9: Part 1 Level 8: Other Visualization Items Part 3

  1. Use Anomaly Detection

Now, suppose, you wanted to go beyond just identifying outliers and look for anomalies. In other words, values that are so much of an outlier that they may be a mistake. Well, we can do this with the anomaly detection.

Now, anomaly detection only works on line chart visuals. So, for instance, a scatter graph, it wouldn’t work at all. So, let’s go to page three where we have got this line chart. Now, we need to have a time series data in the axis here, and you can see, we have indeed got that. So, this looks like we are ready for an anomaly detection. So to go to anomaly detection, we click on this Analytics tab again. And here we can see right at the bottom, find anomalies, but it’s greyed out. There’s a problem. What’s that?

Well, the problem is that it doesn’t work at the moment on visuals that have got two or more fields in the values field well. So, I can’t detect it yet because there are these two. So, let’s go to page one where we don’t have this problem, we’ve just the one line. Okay, so let’s add an anomaly their detection.

So, I’ll go down and right at the bottom, we’ve got anomaly detection. Find anomalies. And again, there’s a different problem. Anomaly detection and forecasting can’t be enabled at the same time. So, you remember, in an earlier video, we added all of this forecasting. So, you can see, it doesn’t work with absolutely all line charts containing time series data but it does work with quite a lot of them. So, what I’m going to do is get rid of this forecast that we’d previously put in. And now I can go to find anomalies. So, I’m going to add it on. And you can see that there is a sensitivity here. So, what this means is how far adrift on where you’d expect it to be can it be? So, if I change the sensitivity, and apply, you will see that more values have come up. If I decrease it, more values go down. Now, if I click on any one of these, then we’ve got an explanation for the anomaly. It is unexpectedly high.

Now, why is it unexpectedly high? Well, you can explain it by a particular data field. So, I could drag into the date the year and say, “Okay, can you try and explain it By Year?” Apply. So, we can’t find any significant explanations for this anomaly. However, if I change it to month, you may find that in your data that there would be a particular reason for something, like for instance, a statistical, it’s normally a lot higher. So, here we can see 16% possibility that it’s because the month is March. The sales volume for March was unusually high, which may have lifted the sales volume year-on-year percentage total. So, you can also see what it is up here. So, that’s the why and here’s the what. It is a range of a value of 56.77%, which is above the expected range of -13 to -3%.

Now, we can change the anomaly shape, and size. So, we can change it to squares, for instance, and we can make them bigger. And what we could also do is have an expected range transparency. So, we have got, if I just get rid of this anomalies pane, this is the area where it expects the values to be. So, you can see this shaded area. Expects it to be roughly in that area. And you can have it really dark, or fairly light. And again, this area will increase or decrease, depending on the sensitivity. So, you can see if I decrease the sensitivity, then I would increase the area. So, if I decrease it to 20%, as long as it’s within this range, a huge range, then it’s not going to consider it an anomaly. If I increase sensitivity, then the sensitivity range tightens and I increase the number of anomalies that I’ve got.

So, this is the anomaly detection under the find anomalies section in the analytics section of the visualisations. So, right at the bottom, at the moment, new features might come in, of course, and push it slightly up. So, it is for line chart visuals containing time series data in the axis field. It requires at least four data points. And it doesn’t support with legends, multiple values or secondary values, or forecasts. So, that is anomaly detection.

  1. Use Groupings and Binnings

In this video, we’re going to group together values. And there’s two ways of doing this, depending on whether you’re grouping together numerical or time data, or whether you are grouping together text, for instance.

So, let’s create a very simple chart. So, this is going to be a clustered column chart. And we’re just going to have a region name in the axis and sales volume in the values. So, we’ve got Greater Manchester, West Midlands, West Yorkshire, Merseyside, South Yorkshire and Tyne and Wear. Okay, I have a hypothesis, I have an idea that all of the regions with West in their title will be better than all of the other regions.

So, what I’m going to do is I’m going to select these two regions, by clicking on one, holding down CTRL and clicking on another, that allows you to select individual items, holding down the CTRL.

Now, I’ve got these two selected, I’m going to right click and go to Group Data.

Now, notice what happens on the right hand side. We have a Region Names, bracket Groups. It turns this into the legend, so a different colour, but that’s not necessarily what I want. So, what I’m going to do, I’m going to remove region name, and I’m going to drag region name groups up from legend to axis.

So, now, if we have a look at the data, in fact, I’ll just turn this side on, we can see that there is West Midlands and West Yorkshire, and other. And you can see, I was wrong with my idea, at least in totality. Maybe, if I turn this back and put this into the legend, and I put a year in the axis, maybe I’ll be right some year, and no I’m not. We’ve got West Midlands and West Yorkshire being below all of the other items.

Now, let’s have a look at this group. I’m going to click on this little arrow and go to Edit Groups. So, here we can see the group we can rename it, we can also rename it here if I double click on it, but we can also see what field it’s on, we can’t edit that. We can also see the group type as a list, and we can see what groups we have. We have West Midlands and West Yorkshire, which includes, obviously, West Midlands, West Yorkshire, those are the two that I clicked on. And then, we have an Other Group, and you can see Include Other Group, and that contains everything else that’s ungrouped. So, maybe, I don’t want the other group and I want to create a new group. So, I’m going to highlight all of these by clicking on the first one and holding down Shift and click on the list and I’m going to click Group and say The Others. So, this will do exactly the same thing except I am manually doing it. So, here are the others.

Now, maybe, I’ve got that hypothesis wrong, maybe it’s not West Midlands or West Yorkshire. I’m going to ungroup all of these, I’m going to include the other group, and instead of West Midlands or West Yorkshire, I’m going to say all of the Yorkshires. So, I’m going to ungroup West Midlands, and I want South Yorkshire to be part of West Yorkshire. So, I’ve clicked on South Yorkshire, I’ve clicked on the group West Yorkshire and I’ve grouped it together. So, now we have the group of South Yorkshire and West Yorkshire. So, now let’s see how these two are. And you can see even bigger disparity. But this just allows me to see what would happen if everything was combined. If I want to rename any of these groups, by the way, then I can double click on them, and so, I can say The Yorkshires for instance. Click OK. So, we have The Yorkshires instead of the individual South Yorkshire and West Yorkshire.

Now, the other type of grouping can be done with numerical values, and this is technically known as binning. So, we put all of the numbers into various bins. So, let’s give an example.

Let’s have the clustered column chart. And I’m going to have the sales volume in the axis and the sales volume in the values, except what I want is not the sum up but the count. So, I want to know how many different instances of each of these sales volumes there are. And you can see, generally, there’s one or two, there’s quite a few of zeros because we have got the last two months all being zero sales volume, we don’t have the data at the moment, and there’s six regions. So, two times six is 12.

However, I’m more interested, not in each individual figure, I’m not really interested in the sales volume 961, I’m more interested in ranges of figures. So, I want to group together the sales volumes. So, if I click on the little down arrow and go to New Group, here, we can see that we have a bin type. So, we are able to do lists just like the previous one, but bin’s more used for numerical date fields. So, you can see that we have a size of bin or number of bins. So, we can either say I want 24 bins, or you can say I want each bin to be a certain size. So, I want each bin to be not in bin size of 263. So, zero to 263, 263 to 526, that doesn’t work in real life. Let’s make it around 500. So, we can see the minimum value is zero, maximum value 6,332. So, in bins of 500, we can see a much more orderly possession. So, we have the highest sales volume at around the 1500 range. So, is 1500 to 1,999 and then decreasing there afterwards. And we can alter the bins, edit them by clicking on the drop down arrow and going to Edit Bins. So, we can say, actually, I want to see what it’s like a 400 bin size, and see whether that gives me any more particular data or analysis that I might choose.

So, grouping, you can group together textual information by clicking on them, holding down CTRL, and then you can right click and go to Group Data.

Alternatively, you can select a particular field and go to New Group and then select whether you want a bin or a list.

img