Five Main Characteristics of Data
Five Main Characteristics of Data
[b]1. Center: [/b][color=#c00000][u]Where the middle of the data set is located. [/u][/color]Mean, median, and mode are the most used measures of center.[br][b]2. Variation (Spread): [/b][color=#c00000][u]A measure of the amount that the data values vary. [/u][/color]Standard deviation, variance, and range are the most used measures of variation. (Variance variation.)[br][b]3. Distribution (Shape): [/b][color=#c00000][u]The nature or shape of the spread of the data over the range of values. [/u][/color]Generic shapes include bell-shaped, right skewed, left skewed, flat, and U-shaped. Specific distributions include binomial distribution, normal distribution, t-distribution, uniform, Poisson, chi-squared .[br][b]4. Outliers: [/b][color=#c00000][u]Sample values that lie very far away from the vast majority of the other sample values.[/u][/color][br][b]5. Time: [/b][color=#c00000][u]Any change in the characteristics of the data over time.[/u][/color][br]A mnemonic to remember the five characteristics of data is [b]CVDOT.[/b] Or the sentence “[b]C[/b]omputer [b]V[/b]iruses [b]D[/b]estroy [b]O[/b]r [b]T[/b]erminate.”[br]All of these characteristics can be very informative.
What does this describe?
Standard Deviation
What does this describe?
Median
What does this describe?
Right Skewed
What does this describe?
In A.D. 2015
What does this describe?
Variance
Measures of Center with Example
After extensive research, Mr. Klapheck recorded the number of feet that 40 children age 10 will chase him while riding a unicorn Feet={1880, 160, 1500, 560, 640, 2180, 1240, 760, 480, 1940, 940, 740, 1320, 580, 660, 1100, 820, 1160, 1140, 1120, 1260, 760, 1360, 680, 440, 1540, 1760, 1260, 420, 2000, 1500, 520, 740, 1620, 1500, 1340, 820, 660, 1200, 2960} feet.
If we are just studying these forty children, then they are the population. If we are studying a larger population (like the endurance of children in California) then these forty children are a sample.[br][br]For now, we are just studying these forty children.
Sample Mean
If we take a sample of the first three data values we have {1880, 160, 1500} we can manually find the mean easily. [br]Add all the sample values together.[br]Divide by the number of values.[br]Then the sample mean is _ _.
Is this mean a parameter or a statistic?
Mean using Technology
GeoGebra Graphing and most graphing calculators allow this to be done in one step. Mean( )[br]Because our data is labeled Feet we use Mean(Feet).[br]Find the mean distance ran by the forty children.
Is this mean a parameter or a statistic?
Finding Mode using DotPlot
Mode is the value that occurs most frequently.[br]If we sort the data that will help us find value that occurs most frequently.[br]Looking at the dot plot, what is the mode distance these children ran?
Finding Mode using Technology
Find the exact answer using the Mode( <data> ) function.
Is this mode a parameter or a statistic?
Finding Median part 1
We can find the median by first sorting the data.[br]Sort(Feet) = {160, 420, 440, 480, 520, 560, 580, 640, 660, 660, [br]680, 740, 740, 760, 760, 820, 820, 940, 1100, 1120, [br]1140, 1160, 1200, 1240, 1260, 1260, 1320, 1340, 1360, 1500, [br]1500, 1500, 1540, 1620, 1760, 1880, 1940, 2000, 2180, 2960}[br]Then finding the value in the middle. Because 40 is even there will be two values.[br]What are the two middle values?
Finding Median part 2
We take the average (mean) of these two values to find the median.[br]So, the median distance these forty children ran was _ _.
Is this median a parameter or a statistic?
Find Median using Technology
We can also use GeoGebra function Median( <data> )[br]What is the median distance run by the children using technology?
If we were studying the endurance of children in USA, then what is the population and sample?
For a sample of number { 1, 2, 3, 4, 4} find the three measures of center.
For a sample of colors { green, green, white, yellow, yellow } find the center.
Range, Standard Deviation, and Variance
Of the following dotplots order then from least variation to most variation.
A
B
C
D
Compare
From the above dotplots can you order the data sets A, B, C, D in order from least variation to most variation (that is from least spread out to most spread out).
Range
We can find the range of these data sets by taking the largest number and subtracting the smallest number. [br]What is the range of these data sets?
Notice how all four data sets have the same range despite some being more spread out than others.[br]Range is a basic way to describe variation in data, but we need more advanced measurements.[br]We will use technology to measure standard deviations and variance.[br][br]What are three ways we can measure variation?
Definitions
The [i][b]standard deviation[/b][/i] of a data set is a measure of variation based on how far each data value deviates, or is different, from the mean.[br]The standard deviation[br][list][*]is never negative.[br][/*][*]is[color=#ff0000][u] zero [/u][/color]if all the data values are[color=#ff0000][u] equal, [/u][/color]and will get larger as the data spreads out.[br][/*][*]has the [color=#ff0000][u]same units[/u][/color] as the original data.[br][/*][*]can be highly [color=#ff0000][u]influenced by outliers, [/u][/color]similar to the mean.[br][/*][/list][br]When we square the standard deviation, this is called [b]variance[/b].[br]Which of the follow is true?