Dispersion - kopio

Categorical variables
Frequency distribution is the summary of the values of a variable based on the frequencies with which they occur.
Numerical variables
[br][color=#0000ff]Range[/color] is the difference between the maximum and the minimum values. It cannot be smaller, if observations are added to the data. It is very sensitive to extreme values.  [br][br][br][color=#0000ff]Interquartile range (IQR, q[sub]r[/sub][/color][color=#0000ff])[/color] is the difference of the upper and the lower quartile. At least 50 % of the values are at this interval. [color=#0000ff]Quartile deviation is [math]\Large \textcolor{blue}{q=\frac{q_r}{2}}[/math][/color] . Both of them can change in both directions, if observations are added to the data.[br][br][color=#0000ff]Deviation[/color] of an observation [i]i  [/i]is the difference of the value and the mean:  [math]\Large x_i -\overline x.[/math] When adding up deviations, positive and negative values cancel out each other meaning, that dispersion would be underestimated. For that reason, absolute values of deviations should be used:  [br][br]   [math]\Large \sum_{i=1}^n\mid x_i - \overline x\mid .[/math][br][br]When adding up  the deviations, the number of observations is significant. Although the deviations would be about the same magnitude, the sum would be much more  for 1000 observations as for 50 observations. In order to be comparable, the sums should be divided with the number of observations to get [color=#0000ff]mean deviation[/color]:  [br][br][br]   [math]\Large \sum_{i=1}^n\frac{\mid x_i - \overline x\mid}{n}.[/math][br] [br][br]When squared differences are used instead of absolute values, it is called as  [color=#0000ff]variance[/color]. Variance do not depend on location. If variance is known [color=#0000ff]for the whole population[/color], the number of observation is [color=#0000ff][i]n[/i][/color] and it is marked with [math]\Large \sigma^2[/math]: [br][br]  [math]\Large \sigma^2= \frac{1}{n}\sum_{i=1}^n(x_i - \overline x)^2.[/math][br][br]If the sample is small, then a divisor is [color=#0000ff] [i]n [/i]- 1[/color] and variance is marked with [color=#0000ff][math]\Large s^2[/math][/color]: [br][br]  [math]\Large s^2= \frac{1}{n-1}\sum_{i=1}^n(x_i - \overline x)^2.[/math][br][br]The divisor [i]n[/i] - 1 makes the estimator to be unbiased, meaning that values of the estimator are stable around the estimated parameter.   [br][br]In variance, the squared values are used for deviation. It means that unit of the result is also squared. The result is much easier to understand, if the square root is taken. In that case, it is called as a [color=#0000ff]standard deviation[/color]. [br][br]If deviations of the same property in different units is compared, then [color=#0000ff]coefficent of variation[/color] can be used:[br][br]   [math]\Large v=\frac{s}{\overline x}.[/math][br][br]It can be solved, if the variables are at least in ratio scale.  [br][br][color=#0000ff]Example[/color]: One year, the mean annual income in USA was $20000 and standard deviation $10000. At the same year, the mean annual income in Great Britain was  £6000 and standard deviation £4000. The coefficient of variation for incomes in USA is 0.5 and in the Great Britain is 0.67. Deviation is larger in the Great Britain. [br][br][br]The same kind of comparison could be done for weights of mice and elephants. [br][br]
Larry Gonick & Woollcott Smith: The Cartoon Guide to Statistics

Information: Dispersion - kopio