When is kurtosis a problem
Johnson, R. Applied Multivariate Statistical Analysis. Pearson: New Jersey. Kline, R. Principles and practice of structural equation modeling 5th ed.
New York:The Guilford Press. Looney, S. How to use tests for univariate normality to assess multivariate normality. American Statistician 49 1 Stevens, J. Streiner, D. Any distribution that is leptokurtic displays greater kurtosis than a mesokurtic distribution.
Characteristics of this distribution is one with long tails outliers. The prefix of "lepto-" means "skinny," making the shape of a leptokurtic distribution easier to remember. The "skinniness" of a leptokurtic distribution is a consequence of the outliers, which stretch the horizontal axis of the histogram graph, making the bulk of the data appear in a narrow "skinny" vertical range.
Thus leptokurtic distributions are sometimes characterized as "concentrated toward the mean," but the more relevant issue especially for investors is there are occasional extreme outliers that cause this "concentration" appearance. Examples of leptokurtic distributions are the T-distributions with small degrees of freedom. The final type of distribution is a platykurtic distribution. These types of distributions have short tails paucity of outliers. The prefix of "platy-" means "broad," and it is meant to describe a short and broad-looking peak, but this is an historical error.
Uniform distributions are platykurtic and have broad peaks, but the beta. The reason both these distributions are platykurtic is their extreme values are less than that of the normal distribution. For investors, platykurtic return distributions are stable and predictable, in the sense that there will rarely if ever be extreme outlier returns. Advanced Technical Analysis Concepts. Hedge Funds Investing. Risk Management. Tools for Fundamental Analysis.
Your Privacy Rights. To change or withdraw your consent choices for Investopedia. The goal was to have a mean of and a standard deviation of The random generation resulted in a data set with a mean of The histogram for these data is shown in Figure 6 and looks fairly bell-shaped. The skewness of the data is 0. The kurtosis is 0. Both values are close to 0 as you would expect for a normal distribution. These two numbers represent the "true" value for the skewness and kurtosis since they were calculated from all the data.
In real life, you don't know the real skewness and kurtosis because you have to sample the process. This is where the problem begins for skewness and kurtosis. Sample size has a big impact on the results.
The 5,point dataset above was used to explore what happens to skewness and kurtosis based on sample size. For example, suppose we wanted to determine the skewness and kurtosis for a sample size of 5.
This was repeated for the sample sizes shown in Table 1. Notice how much different the results are when the sample size is small compared to the "true" skewness and kurtosis for the 5, results. For a sample size of 25, the skewness was -. Both signs are opposite of the true values which would lead to wrong conclusions about the shape of the distribution.
There appears to be a lot of variation in the results based on sample size. Figure 7 shows how the skewness changes with sample size. Figure 8 is the same but for kurtosis. A subgroup size of 30 was randomly selected from the data set. This was repeated times. The skewness varied from What kind of decisions can you make about the shape of the distribution when the skewness and kurtosis vary so much?
Essentially, no decisions. The skewness and kurtosis statistics appear to be very dependent on the sample size. The table above shows the variation. In fact, even several hundred data points didn't give very good estimates of the true kurtosis and skewness. Smaller sample sizes can give results that are very misleading. Dr Wheeler wrote in his book mentioned above:. Shewhart made this observation in his first book.
The statistics for skewness and kurtosis simply do not provide any useful information beyond that already given by the measures of location and dispersion.
So, don't put much emphasis on skewness and kurtosis values you may see. And remember, the more data you have, the better you can describe the shape of the distribution. But, in general, it appears there is little reason to pay much attention to skewness and kurtosis statistics. Just look at the histogram.
It often gives you all the information you need. To download the workbook containing the macro and results that generated the above tables, please click here.
Thanks so much for reading our publication. We hope you find it informative and useful. Happy charting and may the data always support your position. Below is the e-mail Dr. Westfall sent concerning the describing kurtosis as a measure of peakedness. It is printed with his permission. It did lead to the re-writing of the article to remove the peakedness defintion of kurtosis. Thank you for making your information publically available.
I often point students to the internet for supplemental information, and some of your is valuable. Thus, if you see a large kurtosis statistic, you know you have a quality control problem that warrants further investigation.
The average is 2. Subtract 3 if you want excess kurtosis. Now, replace the last data value with so it becomes an outlier: 0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0, 2, 2, 3, 2, 5, 2, 3, The average is Clearly, only the outlier s matter. Nothing about the "peak" or the data near the middle matters. Further, it is clear that kurtosis has very positive implications for spc in its detection of outliers.
Here is a paper that elaborates: Westfall, P. Kurtosis as Peakedness, — The American Statistician, 68, — May I suggest that you either modify or remove your description of kurtosis. It does a disservice to consumers and users of statistics, and ultimately harms your own business because it presents information that is completely off the mark as factual.
Excellent way of explaining, and nice article to get information on the topic of my presentation topic, which i am going to deliver in institution of higher education. I have many samples, let us say , with say 50 cases within each sample. I compute for each sample the skewness and kurtosis based on the 50 observations. In the scatter plot of the sample skewness and sample kurtosis data points I observe a curved cloud of data points between the skewness and kurtosis.
When I used simulated data sets with 50 simulated measurements generated according to an exponential distribution I again found the curved shaped cloud of scatterpoints.
Theoretically, however, the skewness is equal to 2 and the kurtosis equal to 6. Can youn elaborate about this? My e-mail address is A very informative and insightful article. But one small typo, I think.
When defining the figure 3 in the associated description it was mentioned that "Figure 3 is an example of dataset with negative skewness. The right-hand tail will typically be longer than the left-hand tail. Please correct me if I am wrong. Thanks Pavan. Error of Skewness is 2 X. If it does we can consider the distribution to be approximately normal. Another descriptive statistic that can be derived to describe a distribution is called kurtosis. It refers to the relative concentration of scores in the center, the upper and lower ends tails , and the shoulders of a distribution see Howell, p.
In general, kurtosis is not very important for an understanding of statistics, and we will not be using it again. However it is worth knowing the main terms here. A distribution is platykurtic if it is flatter than the corresponding normal curve and leptokurtic if it is more peaked than the normal curve.
0コメント