ASQ Six Sigma Green Belt – Objectives – Hyperledger Part 7
If you talk about time it takes to travel from my home to office, and if I take number of readings and I note down and I plot the histogram of that, and then I will see that the time it takes from my home to office is normally distributed. There are lots and lots of examples related to normal distribution. Few properties of normal distribution are that normal distribution is symmetrically distributed. So this is symmetrical around the center, around the mean, and it has a long tail and bell shaped shape of this curve is something like this. And I’m sure many of you are aware of that. This is a typical shape of a normal distribution. This is the center. Around the center. This graph is symmetrical and these tails are long tails. They do not meet x axis.
They meet x axis at infinity. And here in the normal distribution, the mean mode and median are same. We have not talked about these yet, but we will be doing that in descriptive statistics as we move further. But in case of normal distribution, the mean mode and median are at the same place. So these are some typical properties of a normal distribution. There are two things which define the shape of a normal distribution. One is the mean and the second is the standard deviation.
So, if I look at these normal distributions, let’s say this is one normal distribution, this is another normal distribution. This is the mean. Mean basically tells the location. Where is the location of this? Let’s say this is, let’s say 50 mm. This is 100 mm. That means these two different items. One has the mean of 50, another one has the mean of 100 mm. So this is one thing which defines the shape and the location of the normal distribution. Second is the standard deviation.
Standard deviation is the variation it has. The standard deviation tells that how much spread this curve has. So you can have a curve like this. So this is the center of this curve. Or you could have a curve which is something like this. Or you could have a curve which is like this. All these three curves have the same center, same mean, but they have different standard division, which basically changes the shape of these curves. So if you have a higher standard division, your curve will be wider. If you have a low standard deviation or low variation in the data, then your curve will be narrow. So, here are a few things which you just might want to remember.
But as we move further, you will see more details related to this to just give you an idea. For example, if this is the normal distribution. So let me plot this. This is my normal distribution. This is the center. So let’s say I have one sigma here. Let me put in the red color. So this is one sigma, this is two sigma. This is three sigma. This is one sigma. This is, let’s say, two sigmas from the center. And I have three sigmas from the center. Same thing on the other side, also minus one sigma.
So here this is minus one sigma. Then I have minus two sigma. And then I have minus three sigma. So this is minus three sigma. So in this normal distribution, the area within one sigma or the one standard deviation, is 68%. So this area, so let me put it in the green. This area is the 68% of the area of the curve. Within plus minus two sigma, you will have 95% of the area. And this is approximately I’m turning. And then in plus minus three sigma, you have 99.
7% of the area under the curve. What does this mean? We will understand that that the area under a particular range basically tells the probability. So here is the summary. The total area under the normal curve is one or the 100%. This is the 100% possibility. So let’s say if I have this particular normal distribution, the area under this is one. And then since this is symmetrical, so what does this mean is that 50% of the area is on the right and 50% area is on the left. Another important thing which need to understand is that this is a continuous variable. Earlier, when we talked about binomial and poison, these were discrete variables. And even the shape of those distributions, if you remember, was something like this, which had steps. Here, you don’t have steps here, you get a smooth curve. That means this is for the continuous variable.
One important thing is that the probability of any particular value is zero. This might look a little bit strange to you, but let’s say if we have a class where we have number of students, and based on that we plot this normal distribution, that the class has the average height of 150. So let’s put 150 as the average height and then they have the standard deviation of five.
So this will be 510 15. So these are three sigmas, three standard deviations, and here are 510 15. So this is becoming 135 and this becomes 165 plus minus three sigma. What I want to say here is that if you ask me what is the probability of getting a student with the height of 100 and 7161, hundred and 50, whatever number you give the answer to that is zero. There is a zero possibility of getting a student with the height of 150 or 160 or 170 if you give a single value, because it’s not possible to get exactly 150. 00.
I don’t know how many zeros you have, but you really cannot get any student with that height. This is the reason in any continuous variable, if you are looking for the probability of a particular value, that will be zero. Now, instead of that, what we have is here we are looking at the probability of less than something, greater than something, or between this and this here in this particular class, you can ask me a question. Okay. What is the probability that a student has the height which is greater than 150? Looking at this chart, greater than 150 means half of the curve, this right side. Then I can say, okay, the chance a student has the height greater than 150 will be 50%. Another important thing which we talked earlier was that the area within plus minus two sigma is 95% roughly, let’s say.
So this area which is within plus minus two sigma is 95%. If this is 95%, then the 5% is on the outside of this area. So let me put it blue that this area is 95%. If this is 95%, then 5% is basically split between this area and this area. So each of these area is 2. 5% 2. 5%. Now, if you ask me a question, what is the probability that the height of the student is greater than 160?
So when you say greater than 160, in this case 160 is here, which is at two sigma level. So greater than 160, the probability will be probability of greater than 160 will be 2. 5%, because the area on the right of this particular point is 2. 5%. So this is how you to find out the probabilities here, we just talked about specific number here. We talked about area within plus minus one sigma, plus minus two sigma, plus minus three sigma. But if you want to find out something in between that, there are tables available, there is an Excel formula for that we will talk all about.
This formula at the top of this slide, which is PX is equal to e to the power something divided by sigma square root of two pi. Let’s look at items in this formula. X is the normal random variable, mu is the mean, sigma is the standard deviation. Pi and e are constant. The value of pi is 3. 14. The value of E is two seven something. Let’s let’s understand the difference between binomial and poisoned, which we learned earlier, and this distribution, which is the normal distribution in binomial and poison distribution, what we were calculating was the probability of something, let’s say in binomial we were finding out what is the probability of getting three heads when we toss four coins. So you get a value there. In case of poison, we were finding out what is the probability of getting seven people in the queue in ten minutes. These were represented by something like these bars.
That’s what you had when you have discrete distribution. But here we have continuous distribution, which is the normal distribution. So you don’t get a bar, you don’t get a specific value for a specific value of the random variable. What you get is a continuous curve here. So when I say PX here, this PX is not the probability of one particular event. This probability, you can think of this as A-Y-Y is equal to something y as a function of x. And when you use this formula and create a distribution, the distribution will look something like this, where you have, let’s say zero here. And then you have a normal distribution, which is a continuous curve.
And we already talked previously that the probability of a specific value is zero in any continuous variable. In continuous variable, what we are looking for is the probability of something greater than something or less than something, or between something. So this is the formula for calculating Y. So for any particular value of X, let’s say this is the value of x, the value of Y will be this. This is the value of Y. And you plot infinite number of these points to make this smooth curve. So each value of X, you find out Y join all those YS, and that gives you this normal distribution. Now, what happens in reality is that you can have a number of normal distributions.
For example, let’s say you have a class where you want to measure the height of students, heights of students have the average of 150 and the standard deviation of ten. So you will have one normal distribution for that particular class, which will look something like this, the average as 150 and the standard deviation of ten means. Let’s put three sigma here, which is 123 sigma here, which is 180. So roughly most of this distribution will be something like this. So this is one distribution for the class. Now you have another distribution, let’s say the volume of water in the bottle. And this water bottle has a volume of 500 CC and this has a standard deviation of two CC. Let’s say then for that you will have a different curve which will have 500 as the mean and two CC. And if I put, let’s say minus three sigma here, which will be minus six.
So 494 and 50 six. So this curve will be something like this, this normal distribution. What does this mean is that we can have infinite number of normal distribution curves. To overcome this, what we do is we standardize the normal curve and with that we come out with standard normal curve. Standard normal curve is that instead of plotting x, what we plot here is z and z is equal to x minus mu divided by sigma. So what we do here in this particular case, let’s say water bottle case, we take x, any value of x minus mu is 500 divided by sigma as two. This is the z value. And if we plot this, this will give us a standard normal distribution curve. Any standard normal distribution curve will have zero as the mean and one as the standard deviation. Any normal distribution you can convert into standard normal distribution and that standard normal distribution will have mean as zero and sigma is equal to one. So this is something which you will be looking like and that’s what this z is. Z is the standard score. So as we go further, you will see that every time we will calculate z, because that helps us in looking at a specific table, z table and find out the values of area under a certain range. We will talk about that.
So here is the table which I was talking about for the standard normal distribution. So whatever distribution you have, whether your mean is one and standard deviation of ten, or you have a mean of 500 and the standard deviation of two, whatever it is, you can convert that to z value. Once you convert that to z value, then you can find out the area under the curve using these standard normal distribution tables. So this table you will find at the end of any book or any binder which you buy for exam preparation. So let’s say I have a standard normal distribution.
Here is my standard normal distribution curve. And let’s say if I calculate the value of z, we will be calculating the value of z as we go into one example. But let’s say if I calculate the value of z is equal to 1. 51, what does this mean is I need to look at this table. To look at this table, you look at 1. 5 here. So these are the z values and the next one comes from here.
So this becomes 1. 51 and this is the value. This value gives you the area to the right of 1. 51, the z value. So here z value is 1. 51. And what does this table tell me that this area on the right is 0. 652 or let’s say roughly 6. 5%. Area is on the right of z is equal to 1. 51. So this is how you use these tables.
Let’s look at this example here. In this example we have a company which is filling perfume in the bottles and this is supposed to have 150 CC in each of these bottles. So average volume is 150 CC. And based on the history we have found out that the standard deviation is two CC. So this is something which we know. So now the question is what percent of bottles models will have volume more than 153 CC. So one way could be to draw a normal distribution. So in normal distribution we will have 150 CC and then two CC, two CC, two CC and we have a normal distribution curve. What we normally do is we convert this into standard normal distribution. So instead of having 150 as a mean and the specific standard deviation, what we do is we convert this into standard normal distribution which has zero as a mean and one as the standard division.
So for that we need to convert our data into the z value. So here I have the mean which is 150 CC, I have sigma which is two CC. I convert this into z value. And we looked at this formula earlier as well. Z is equal to x minus mu divided by sigma. Here I am interested in 153 CC more than 153 CC. But when I convert this in terms of z value, this will be 153 which is x here minus mu is equal to 150 divided by two as the standard deviation this gives me 1. 5. So z is equal to 1. 5. And now on this standard normal curve if I put z is equal to 1. 5, what I need is area to the right of this 1. 5. That’s what I need to find out and which I can look at this table which is here and this is 1. 5. And if I look at this particular one which is zero, this gives me the area of zero point 668 to the right of z is equal to 1. 5.
And this is what you can see here on this slide. The probability of getting more than 153 CC is 0. 06,681 or 6. 681% of bottles are expected to have more than 153 CC. So here is the plot which I have done using minitab for the mean value is equal to 150, standard division is equal to two and this gives me area to the right of 153 CC. For same thing, if you want to use Microsoft Excel then you can use this formula which is normist and then you give the value of x, then you give value of mu and then you give the value of sigma and then true means you are looking at the cumulative area, so you keep it true. This gives me zero point 93, but this is not 6%, this is 93. 3%. So what you need to remember is that in Excel when you are looking at the area, the area it gives is this area area to the left of that point. So whether you are looking at a table or whether you are looking at a software results, make sure that you look at this that what area is being shown. Is it the area to the right of that particular point or is it the area to the left of this point. So here in Microsoft Axel, what you get is area to the left of 1. 5.
Many of the tables which you get at the back of books or binder might also show area to the left of the point. So make sure that you understand that which area you are looking at so if you do one -. 933 this will again give you 6. 681%. So this is how you use standard normal distribution to do these types of calculation where you know the population mean and where you know the population standard deviation. In this second example, I’m looking at the probability of volume in the bottle between 148 and 152 CC. We have the same mu, we have the same sigma but now since we are looking at the area between two points. So what we need to do is we need to calculate z value for both of these which is z one and z two. Z one comes out to be 148 -150 divided by standard deviation which gives me minus one and z two comes out to be plus one. So now, what I’m looking at is the area between z is equal to plus one and z is equal to minus one. Standard normal distribution is symmetrically distributed. So if I have this as a standard normal distribution, this is my center.
Whether I go to the left or to the right, the areas are equally distributed. So if I go to minus one sigma and plus one sigma, area from the center to plus or from center to minus one sigma is all the same. And if I look at this table, which gives me at 1. 0, this area is 00:15, eight, six, six and same thing will be for area on the left as well. 00:15 is this area zero point 15 and on the left also is zero point 15. If I remove both of these areas from one, that will give me the area between plus one sigma and minus one sigma or plus one value of z and minus one value of z. And this comes out to be 68. 26%. This is something if you remember earlier also when we had a slide where I was showing the area between plus one sigma minus one sigma. This was 68% area between plus two sigma and minus two sigma was 95%. So this basically confirms what we talked earlier.
Now let’s look at slight more change to this quotient. So in this case, I draw four bottles from the production line and take the average of these four bottles, the volume of these four bottles. And now I’m interested in the average of these four bottles, not the individual value. And what I’m interested in is what is the chance or what is the probability that the average will be more than 153 CC. Earlier we were looking at finding out the probability of getting one bottle which has volume more than 153 CC. But now here in this question we are looking at the probability of getting the average of four bottles having volume more than 153 CC. Here I will ask you to go and if you need look back at central limit theorem. In central limit theorem we were talking about the sampling distribution. So here what we are doing is we are not looking at the distribution of individual items here. We are looking at the distribution of the averages average of four bottles.
And in that, what we said was that the sigma of sample averages which is sigma x bar will be sigma divided by square root of N. This is something which we talked in central limit theorem. And this is one of the important thing here. Now, just think logically. If you have a production line which is producing 150 CC on average and two CC as a standard deviation, there is less chance of getting a bottle which has 153 CC. But then there is even less chance to get the average as 153 CC. Because once you draw four items and you take average of that, this will average around the mean. So let’s say if this is the distribution, this is 150 and let’s say this is 153, the chance of getting 153 is less. But if you draw four bottles, the chance of getting 153 will be even less. Because one bottle might come from here, one bottle might come on the lower side.
So the average of these four bottles will be more near to the center. And this is what is confirmed by this central limit theorem which says that the distribution of sample means will have the standard deviation as sigma divided by square root of N. And this is what we have done here. Here, instead of taking sigma as two, we take two divided by square root of four. Four are the bottles which we have drawn. This gives me one, one as the standard deviation of sample means. And now, if I calculate the z value for this, this gives me the z value of three instead of what we got earlier as 1. 5. So with this, z is equal to three. Now, if you look at the table, this will give you the probability or the area to the right of z is equal to three as 0. 00,135 or 00:13 5% only.
Popular posts
Recent Posts