The determination of quantiles requires the order statistics of the data. One of the definition is copied from order statistics:
Our goal is to find the value that is the fraction
x[p]=x(k)+t[x(k+1)−x(k)]=(1−t)x(k)+tx(k+1)
However, this is only one of the nine ways to compute quantiles and not even the best one, R7 in Wikipedia. This is the result of computational load in the past (see this article). This article also discussed the best estimate method, R8 in Wikipedia. This is connected to Tukey plotting position formula through CDF, discussed later.Sometime we want to compare two distributions, for example, we want to see if two empirical distributions have the common features, or would like to know if one empirical distribution can be fitted by a theoretical distribution. Histogram and ECDF have been widely used for fitting a theoretical distribution, while the results heavily rely on the bin width. Quantile-quantile plot is a more robust way to do the comparison. qq-plot is a scatterplot, with each coordinate pair defining the location of a point consists of a data value, and the corresponding estimate for that data value derived from the quantile function of the fitted distribution. Note that quantile function is the inverse of the cumulative distribution function, therefore the methods for plotting position for CDF is the inverse methods for quantiles estimation. This article about qq_plot is a very detailed and clear online material for understanding the basis of CDF matching.
No comments:
Post a Comment