ASQ Six Sigma Green Belt – Objectives – Hyperledger Part 4
We also looked at factorials permutations and combinations. Now, coming to the second important topic which is central limit theorem. This is one of the important theorem or one of the important concept in statistics which you need to have very clear idea about. Some of the things which we will be discussing in this lecture might not make immediate sense to you you.
So what I will suggest is as you go further into this course, as you learn some more concepts, come back and look at this lecture once again when we talk of central limit theorem, this is based on sampling. And sampling is one of the important aspect in statistics. In statistics there are two main branches descriptive statistics and inferential statistics. In descriptive statistics, you look at the data in hand and you try to make sense out of that. So for example, if you have a lot of numbers, you take mean of that, you find out standard deviation of that, that is descriptive statistics.
We will cover descriptive statistics in more detail as we go further into this course. In addition to descriptive statistics, there is inferential statistics. So what we do in inferential statistics is we infer, we infer from the sample which we take. So what we do is from the whole population, from the big lot, we take out some samples and we study those samples.
So, if you look at the diagram here we have a population which is the bigger circle and the smaller circle is samples, random samples which we have taken from the population many times. You really cannot study the whole population, you cannot look at the height of all the people in the country. What you can do is you can take sample and study the height of that. And based on that, you make a judgment that what is the average height or what’s the standard deviation of the height in the country.
So you look at the sample and based on the sample result, you predict about the population, whatever numbers you get from sample are called as statistic. And here I’m talking about statistic, not statistics. Statistics is the branch of mathematics. Statistic is the summary of the sample data which you have. So based on this statistic, you make inference and inference about the population. So anything which is related to the population is called as parameter.
Many a times, as I said, we might not know the parameters. So what we do is based on statistics, we infer that what could be the parameter based on the height of some of the people in sample, we infer about the average height of people in country, which is the parameter. There are two important measurements. One is the mean and the second is standard deviation. So when we have a statistic or when we are dealing with the sample data, it is shown by x bar and s. X bar is the mean and s is the standard deviation.
When we talk of parameter. We show these two things as mu is the population mean and sigma is the population standard deviation. So this was a basic understanding about the sampling. Now what we do with sampling, we do number of things using sampling. Let’s take an example here that here I have some number, some measurement where I have the mean as 100. This mean of 100 could be the average length of the piece which I am producing. And this has a standard deviation of one which is shown by the normal distribution curve on the top of population.
We have not talked about normal distribution curve, we will talk about that. So if you want after learning those things you can come back and look at this lecture once again. So this is the population. So from this population if you draw four samples since the average length here is 100, most of the items have length around 100. Some have length of 99 one then some have length up to 10 two also. So the four sample which I draw, I get these four samples 1021-0199 and 100. So this is my sample.
So what I do is I calculate the mean of this sample. Mean of this sample comes out to be 105. So this is my x bar. Based on this mean I can predict about the population. So my prediction about the population will be that population mean is 100. 5. But actually we know that this is 100 because this is a big data. But based on the sample we predict that the mean is 100. 5. But then what we do is we find out the confidence interval.
The concept of confidence interval is not included as a part of six sigma green belt. But I will just give you a brief understanding about the confidence interval here details are covered in the black belt course. So there is a formula for confidence interval which you can see here. X bar plus minus z alpha by two sigma divided by square root of n. Whatever formula this is, I use this formula and I come out with confidence interval is equal to 100. 5 plus minus one. This is with 95% confidence level because when you are taking sample you cannot be 100% sure.
So with 95% confidence I can say that the population average or the population mean is somewhere between 100. 5 plus minus one, that means 99. 5 and 101. 5. So somewhere in between that is the population average and which is here 100. So which falls in this range. So this is confidence interval. Don’t go into the details of confidence interval. I just want you to focus on one particular aspect here which is sigma divided by square root of N. Just keep this in mind because what we are talking here in this topic is central limit theorem.
And when we go further we will talk about this term which is also called as standard error. Just keep this in mind. This is one application of the central limit theorem which we are learning. The second application of central limit theorem is in hypothesis testing. We will be talking about hypothesis test as we go further into this course. And what we do in hypothesis test is we draw some samples.
These are exactly the same samples which we talked about in the previous example here. Samples are 1021-0199 and 100. Based on that, I want to judge, I want to infer whether things have changed or not. So this was a machine which was supposed to produce items 100 average and with the standard deviation of one. So this was the machine which was ongoing.
I just picked four samples out of that. And based on that I want to judge whether the machine has shifted or the machine setting need to be redone or not. I calculate the z value. Based on that, I decide whether I reject the null hypothesis or I fail to reject the null hypothesis. And we will talk about these concepts also as we go further. So don’t worry much about this.
The only thing at this time you need to focus is on this one, which is sigma divided by square root of N. This is the same thing which we talked earlier when we talked about confidence intervals. Now, coming to the third application of central limit theorem, and remember, we have not talked about central limit theorem yet, the only thing I’m showing here is sigma divided by square root of N.
Now, coming to this third example of central limit theorem, which is in the control charts. In control charts we take some samples and based on those samples, we draw a control chart something like this, where we have the mean and we have upper control limit, lower control limit. And what we do is from the machine or from the process, we keep on drawing certain number of items, let’s say five items, four items, take the mean of that and that we put it here, this is my x bar R chart. We will talk about these control charts later on. So, here is my mean. When I draw five samples, take the average of that plot a mean here, take another five sample, take the average of that, put it here.
As long as my average is within the upper and lower control limit, everything is fine. That means process is in control. What I want you to look at here is that control limits are decided by this formula, which is the average x bar plus minus a two and R bar here a two is something which is deciding that how wide these limits are going to be. So when I keep on taking average, keep on plotting this, my upper control limit and lower control limit, how wide are these? This is decided by a two. And if you see this a two, this keeps on reducing with number of samples. So if I draw two samples, my control limit will be different. If I start drawing four samples, my control limits will be much narrower because now my A two has changed from 1. 8 to zero. 79.
Let’s talk about the first aspect of Central Limit Theorem which is here. So let’s say this is number one. We say that for almost all population the sampling distribution of mean can be approximated closely by a normal distribution, provided the sample size is sufficiently large. Doesn’t make sense. Don’t worry, we will talk about this. Let’s say we have some data which is not normally distributed. So let’s say this is the distribution of the data which we have and which is nowhere near the normal distribution. What we do here is we pick, let’s say five items from this. So item number one I pick which comes from here, item number two, somewhere here, three, four, five. So I pick four items from this particular distribution, take the average of that. So this becomes my x 1 bar, the first average. So I put it here on a chart. So this is my first x 1 bar. I plot this on this chart here, which is here. Then I take another five samples, take the average of that and I plot that average on this particular chart. So now this chart on the top is the distribution of the population or whatever we have as input. What we have on the bottom is this is the distribution of means of samples. So this is the sample means. So these are not individual items here on this plot, on this plot we are plotting the sample means.
So what Central Limit theorem says is that this sample mean distribution, the distribution of sample means will be normally distributed even if your original distribution is not normally distributed. So this makes a lot of sense in control charts where we take samples and we are not sure whether the population from which these samples come are normally distributed or not. But what we know is that the averages which we plot on control charts, those follow the normal distribution. This is one simple concept here. So whatever distribution you have, if you take some samples, take the average of those samples and plot that this new plot, which is the plot of sample averages will be a normal distribution. Coming to the second important aspect of the Central Limit theorem which is here. So this is aspect number two. So what we say here is the place from where we are taking samples.
Let’s say that particular population has a mean of mu and a variance of sigma square. And here I am saying variance of sigma square, that means standard deviation of sigma. We will cover these aspects as well once we talk about descriptive statistics later. So this is the original distribution which has a mu as the mean and sigma square as the variance. Now when we take N samples from this, each time we draw N number of samples, n could be two, N could be five and could be 20, then the sample mean and as we said earlier also the distribution of sample means will be a normal distribution.
So this sample mean distribution will have a mean is equal to MU x bar and variance is equal to sigma x bar square. And this will be the formula for both of these. So let’s understand these. So we have a distribution, let’s say, which was not a normal distribution, whatever distribution this was. We draw, let’s say nine samples from this plot, this on a distribution here, draw another nine samples from this plot it here the average. And we keep on plotting these averages and you will see that this follows a normal distribution. The average of your original distribution, which is mu, will be the same as the average of these averages, MU x bar. This is one thing.
The second thing is the standard deviation of this second distribution, which is sigma of x bar squares. So what we are having here is x bars in this second case. So sigma x bar square will be sigma x square divided by n. Or if I take square root of everything, this will be sigma x bar is equal to sigma x divided by square root of n. So if this looks complicated, let’s look at the example on the next slide which will make more sense. So what it tells you here in general is that as the number of items which you draw as a sample, as the number increases, your standard deviation of sample means will be narrower, the sigma will be less and which we can see here. So here is my one distribution. So this is my population, histogram of population. From here I am taking some samples.
So let’s say if I take one sample at a time and make a distribution of this, this will be my distribution which will look something similar to the original distribution because we are taking one sample only at a time and this we have taken 10,000 times. And I plot this distribution with one sample at a time instead of one sample at a time. If I take five samples at a time and I plot the averages of that, those averages will show distribution something like this. So this is with five samples, which is not exactly normal distribution, but slightly looks like a normal distribution instead of five. If I take ten samples, average them and plot the distribution of that, this starts looking like a normal distribution. The next one is with 30, the next one is with 50 items drawn at a time and at the bottom I have 100 items drawn at a time. If you have not learned anything what I talked previously, if you just focus on this one chart, two things which you will see here is that as number of samples increase the distribution of sample means which is here, start looking like the normal distribution.
So if you look at 30 here, if I draw 30 samples and I draw the distribution of sample averages, this will look like a normal distribution. This is one thing which we talked the second important thing which we talked in central limit theorem was as number increases, the width of this curve will keep on reducing. So here, let’s say in this case, if sigma was the standard deviation of this population, then in this particular case, let’s say here the standard deviation of this curve which you are getting for the sample averages will be sigma divided by square root of N. So more number of samples you have narrower will be your curve. These two things only make the central limit theorem, nothing else.
So everything you can see visually on this single slide. So, as I previously said, that the standard deviation of sample means or technically saying that standard deviation of the sampling distribution of sample means is this one, which is sigma x divided by square root of n. This is also called as standard error of means. So you will see this term also being used standard error of means rather than this big technical term which says that standard deviation of sampling distribution of sample means.
But what this is, this is the standard deviation of the graph or the plot which we had or the distribution which we had for the sample means. And if you remember earlier also when I was talking about the confidence interval, I said keep focus on this term sigma divided by square root of N. When I was talking about the hypothesis testing, this was the term I was telling. And in control chart I was showing the term a two which was reduced, reducing as number of samples were increasing.
Popular posts
Recent Posts