|
|
|
Alternative hypothesis The alternative hypothesis, denoted H1 (read "H-one"), is a claim to be tested. Analysis of Variance (ANOVA) An inferential method that is used to test the equality of three or more population means. Arithmetic mean The arithmetic mean of a variable is computed by determining the sum of all the values of the variable in the data set, divided by the number of observations. Bar graph A bar graph is constructed by labeling each category of data on a horizontal axis and the frequency or relative frequency of the category on the vertical axis. A rectangle of equal width is drawn for each category. The height of the rectangle is equal to the categoryís frequency or relative frequency. Bell-shaped distribution A probability distribution in which the highest frequency occurs in the middle and frequencies tail off to the left and right of the middle. Biased When a statistic consistently overestimates or underestimates a parameter. Bimodal A data set is bimodal if it has two modes. Binomial experiment An experiment is said to be a binomial experiment provided
Bivariate data Data in which two variables are measured for each individual in the study. Categorical variables Qualitative or Categorical variables allow for classification of individuals based on some attribute or characteristic. Census A list of all individuals in a population along with certain characteristics of each individual. Chi-square independence test The chi-square independence test is used to find out whether there is an association between a row variable and column variable in a contingency table constructed from sample data. The null hypothesis is that the variables are not associated; in other words, they are independent. The alternative hypothesis is that the variables are associated, or dependent. Chi-square test for homogeneity of proportions A test of the claim that different populations have the same proportion of individuals with some characteristic. Claim The explicit statement of the problem. The statement should provide the experimenter with direction. In addition, the statement must identify the response variable and the population to be studied. Class A category of data that is created by an interval of numbers. Class midpoint The class midpoint is found by adding a classís lower class limit and upper class limit and dividing the result by 2. That is,
Class width The difference between consecutive lower class limits. Closed question One in which the respondent must choose from a list of predetermined responses. Cluster sample A cluster sample is obtained by selecting all individuals within a randomly selected collection or group of individuals. Coefficient of determination The coefficient of determination, R2, measures the percentage of total variation in the response variable that is explained by the least-squares regression line. Column variable The category represented by the columns of a contingency table. Combination An arrangement, without regard to order, of n distinct objects without repetitions. The symbol nCr represents the number of combinations of n distinct objects taken r at a time. Common logarithm If a = 10 in the expression y = logax, the resulting logarithm, y = log10x, is called the common logarithm. It is common practice to omit the base, a, when it is equal to 10 and write the common logarithm as y = log x. Completely randomized design An experiment in which the experimental units are randomly assigned the treatment. Complement Let S denote the sample space of a probability experiment and let E denote an event. The complement of E, denoted Conditional distribution A conditional distribution lists the relative frequency of each category of a variable, given a specific value of the other variable in the contingency table. Conditional probability The notation P( F | E ) is read "the probability of event F given event E." It is the probability of an event F given the occurrence of the event E. Confidence intervals Intervals constructed about the predicted value of y, at a given level of x, that are used to measure the accuracy of the mean response of all the individuals in the population. Confidence interval estimate A confidence interval estimate of a parameter consists of an interval of numbers, along with a probability that the interval contains the unknown parameter. Confounding When the effect of two predictor variables on the response variable cannot be distinguished. Consistent A statistic is consistent if, the larger the sample, the closer the sample value of the statistic gets to the population value of the statistic. Contingency table A table relating two categories of data. Continuous data Observations corresponding to a continuous variable. Continuous random variable A random variable that has an infinite number of possible values that is not countable. Continuous variable A quantitative variable that has an infinite number of possible values that are not countable. Convenience sample A sample in which the individuals are easily obtained. Critical region The critical region or rejection region is the set of all values such that the null hypothesis is rejected. Critical value The maximum number of standard deviations the sample mean can be from m0 before the null hypothesis is rejected. Cumulative frequency distribution A cumulative frequency distribution displays the aggregate frequency of the category. In other words, for discrete data, it displays the total number of observations less than or equal to the category. For continuous data, it displays the total number of observations less than or equal to the upper class limit of a class. Cumulative relative frequency distribution A cumulative relative frequency distribution displays the aggregate proportion (or percentage) of observations less than or equal to the category. Dependent Two events E and F are independent if the occurrence of event E in a probability experiment affects the probability of event F. Dependent sampling A sampling method is dependent when the individuals selected to be in one sample are used to determine the individuals to be in the second sample. Descriptive statistics Descriptive statistics consists of organizing and summarizing the information collected. Designed experiment A controlled study in which one or more treatments are applied to experimental units. The experimenter then observes the effect of varying these treatments on a response variable. Control, manipulation, randomization, and replication are the key ingredients of a well-designed experiment. Deviation about the mean The difference between an observation and the arithmetic mean. Deviations Differences between predicted values and observed values. Discrete data Observations corresponding to a discrete variable. Discrete random variable A random variable that has either a finite number of possible values or a countable number of possible values. Discrete variable A quantitative variable that has either a finite number of possible values or a countable number of possible values. The term "countable" means that the values result from counting such as 0, 1, 2, 3, and so on. Disjoint If events E and F have no simple events in common or cannot occur simultaneously, they are said to be disjoint or mutually exclusive. Distribution-free procedures Inferential procedures that are not based upon parameters and that require fewer assumptions to be satisfied in order to perform the tests. They do not require that the population follow a specific type of distribution (such as the normal distribution). (See Nonparametric statistical procedures.) Double blind A designed experiment in which neither the experimental unit nor the experimenter knows what treatment is being administered to the experimental unit. Efficient A statistic is efficient if, in repeated samples, a majority of the sample values of the statistic are "close" to the population values of the statistic. Empirical Rule, The If a distribution is roughly bell shaped, then
Equally likely outcomes An experiment is said to have equally likely outcomes when each simple event has the same probability of occurring. Event Any collection of outcomes from a probability experiment. An event may consist of one or more simple events. Events are denoted using capital letters such as E. Expected value The expected value of a discrete random variable X, denoted E(X), is obtained using the formula
Experimental unit A person, object, or some other well-defined item that is a member of the population being studied and upon which a treatment is applied. (See Individual and Subject.) Explained deviation The deviation between the predicted value of the response variable, Factorial symbol If
Fences Cutoff points for determining outliers. Five-number summary The five-number summary of a set of data consists of the smallest data value, Q1, the median, Q3, and the largest data value. Frame A list of all the individuals within the population. Frequency distribution A frequency distribution lists the number of occurrences for each category of data. Frequency polygon A frequency polygon is drawn by plotting a point above each class midpoint on a horizontal axis at a height equal to the frequency of the class. After the points for each class are plotted, straight lines are drawn between consecutive points. Goodness-of-fit test An inferential procedure used to determine whether a frequency distribution follows a claimed distribution. Histogram A histogram is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle should be the same and the rectangles should touch each other. Hypothesis A statement or claim regarding a characteristic of one or more populations. Hypothesis testing A procedure, based on sample evidence and probability, used to test claims regarding a characteristic of one or more populations. Independent Two events E and F are independent if the occurrence of event E in a probability experiment does not affect the probability of event F. Two events E and F are independent if and only if P( F | E ) = P( F ) or P( E | F ) = P( E ). Independent sampling A sampling method is independent when the individuals selected for one sample do not dictate which individuals are to be in a second sample. Individual A person, object, or some other well-defined item that is a member of the population being studied and upon which a treatment is applied. (See Experimental unit and Subject.) Inferential statistics Inferential statistics uses methods that generalize results obtained from a sample to the population and measure their reliability. Influential observation An observation that significantly affects the value of the slope and/or intercept of the least-squares regression line. Interquartile range The interquartile range or IQR is the difference between the third and first quartile. Kruskal-Wallis test A nonparametric test that is used to test the claim that k independent samples come from populations with the same distribution. Least-squares regression line The least-squares regression line is the one that minimizes the sum of the squared errors. It is the line that minimizes the square of the vertical distance between observed values of y and those predicted by the line,
Least-squares regression model The least-squares regression model is given by
where yi is the value of the response variable for the i th individual, b0 and b1 are the parameters to be estimated based upon sample data, xi is the value of the predictor variable for the i th individual, ei is a random error term with mean 0 and variance s2ei = s2 (the error terms are independent), and i = 1...n where n is the sample size (number of ordered pairs in the data set). Level of confidence The level of confidence in a confidence interval is a probability that represents the percentage of intervals that will contain m if a large number of repeated samples are obtained. The level of confidence is denoted (1 - a) · 100%. Level of significance The level of significance, a, is the probability of making a Type I error. Linear correlation coefficient The linear correlation coefficient or Pearson product moment correlation coefficient is a measure of the strength of linear relation between two quantitative variables. The Greek letter r (rho) is used to represent the population correlation coefficient and r to represent the sample correlation coefficient. The formula for the sample correlation coefficient is
where Logarithm to the base a The logarithm to the base a, where
In order for the logarithm to be defined, x must be greater than 0. Lower class limit The smallest value within a class. Lurking variable A variable that is related to either the response or predictor variable, or both, but is excluded from the analysis. Mann-Whitney test A nonparametric procedure that is used to test the equality of two population medians from independent samples. Matched-pairs design A randomized block design in which the experimental units are somehow related. Margin of error The margin of error, E, in a (1 - a) · 100% confidence interval in which s is known is given by
where n is the sample size. Note: The population from which the sample was drawn is required to be normally distributed or the sample size n must be greater than or equal to 30. Marginal distribution A frequency or relative frequency distribution of either the row or column variable in the contingency table. Mean See Arithmetic mean. Median The median of a variable is the value that lies in the middle of the data when arranged in ascending order. That is, half the data are below the median and half the data are above the median. We use M to represent the median. Mode The mode of a variable is the most frequent observation of the variable that occurs in the data set. Multimodal A data set is multimodal if it has three or more modes. Mutually exclusive If events E and F have no simple events in common or cannot occur simultaneously, they are said to be disjoint or mutually exclusive. Negatively associated Two variables that are linearly related are said to be negatively associated when above-average values of one variable are associated with below average values of the corresponding variable. That is, two variables are negatively associated if, whenever the values of the predictor variable increase, the values of the response variable decrease. Nonparametric statistical procedures Inferential procedures that are not based upon parameters and that require fewer assumptions to be satisfied in order to perform the tests. They do not require that the population follow a specific type of distribution (such as the normal distribution) and, therefore, are often referred to as distribution-free procedures. Nonsampling errors Errors that result from the survey process. They are due to the nonresponse of individuals selected to be in the survey, to inaccurate responses, to poorly worded questions, to bias in the selection of individuals to be in the survey, and so on. Normal probability distribution If a continuous random variable is normally distributed or has a normal probability distribution, then a relative frequency histogram of the random variable has the shape of a normal curve (bell-shaped and symmetric). Normal probability plot A graph that plots observed data versus normal scores. Normal score The expected Z-score of the data value if the distribution of the random variable is normal. Null hypothesis The null hypothesis, denoted H0 (read "H-naught"), is a statement to be tested. The null hypothesis is assumed true until evidence indicates otherwise. Observational study An observational study measures the characteristics of a population by studying individuals in a sample, but does not attempt to manipulate or influence the variable(s) of interest. Observational studies are sometimes referred to as ex post facto (after-the-fact) studies because the value of the variable of interest has already been established. Ogive An ogive (read as "oh jive") is a graph that represents the cumulative frequency or cumulative relative frequency for the class. It is constructed by plotting points whose x-coordinates are the upper class limits and whose y-coordinates are the cumulative frequencies or cumulative relative frequencies. After the points for each class are plotted, straight lines are drawn between consecutive points. One-sample sign test The one-sample sign test converts data to plus and minus signs in order to test a claim regarding the median. Open ended table A table in which the last class does not have an upper class limit. Open question One in which the respondent is free to choose his or her response. Outliers Extreme observations in the data set. Parameter A descriptive measure of a population. Parametric statistical procedures Inferential procedures that rely on testing claims regarding parameters such as the population mean, m, the population standard deviation, s, or the population proportion, p. In some circumstances, the use of parametric procedures requires that certain assumptions regarding the distribution of the population, such as normality, be satisfied. Pareto chart A bar graph whose bars are drawn in decreasing order of frequency or relative frequency. Pearson product moment correlation coefficient See Linear correlation coefficient. Percentile, k th The k th percentile, denoted Pk of a set of data divides the lower k % of a data set from the upper (100 - k)% of the set. Percentiles divide a data set that is written in ascending order into 100 parts, so that there are 99 possible percentiles that can be computed. Permutation An ordered arrangement in which r objects are chosen from n distinct (different) objects and repetition is not allowed. The symbol nPr represents the number of permutations of r objects selected from n objects. Pie chart A circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category. Point estimate A point estimate of a parameter is the value of a statistic that estimates the value of the parameter. Poisson process A random variable X, the number of successes in a fixed interval, follows a Poisson process provided the following conditions are met:
Population The group that is to be studied. Population arithmetic mean The population arithmetic mean, m (read "mew"), is computed using all the individuals in a population. The population mean is a parameter. Population mean of a variable from a frequency distribution
where Population standard deviation The population standard deviation, s, is obtained by taking the square root of the population variance. That is,
Population variance The population variance of a variable is the sum of the squared deviations about the population mean divided by the number of observations in the population, N. That is, it is the arithmetic mean of the sum of the squared deviations about the population mean. The population variance is symbolically represented by
where NOTE: In using the formula, do not round until the last computation. Use as many decimal places as allowed by your calculator in order to avoid round-off errors. Population variance of a variable from a frequency distribution
where Positively associated Two variables that are linearly related are said to be positively associated when above-average values of one variable are associated with above average values of the corresponding variable. That is, two variables are positively associated if, whenever the values of the predictor variable increase, the values of the response variable also increase. Power curve A graph that shows the power of the test against values of the population mean that make the null hypothesis false. Power of a test The value of 1 - b, where b is the probability of making a Type II error. This is the probability of rejecting the null hypothesis when the alternative hypothesis is true. The higher the power of the test, the more likely the test will reject the null hypothesis when the alternative hypothesis is true. Practical significance Practical significance refers to the idea that small differences in statistics can be statistically significant, while not large enough to have any functional value. Prediction intervals Intervals constructed about the predicted value of y that are used to measure the accuracy of a single individual's predicted value. Predictor variables The factors that affect the response variable. Probability A measure of the likelihood of a random phenomenon or chance behavior. Probability density function A probability density function is an equation used to compute probabilities of continuous random variables that must satisfy the following two properties.
Probability distribution The probability distribution of a random variable X provides the possible values of the random variable and their corresponding probabilities. A probability distribution can be in the form of a table, graph or mathematical formula. Probability histogram A histogram in which the horizontal axis corresponds to the value of the random variable and the vertical axis represents the probability of that value of the random variable. P-value A P-value is the probability of observing a sample statistic as extreme or more extreme than the one observed under the assumption that the null hypothesis is true. Qualitative data Observations corresponding to a qualitative variable. Qualitative variables Qualitative or Categorical variables allow for classification of individuals based on some attribute or characteristic. Quantitative data Observations corresponding to a quantitative variable. Quantitative variables Quantitative variables provide numerical measures of individuals. Arithmetic operations such as addition and subtraction can be performed on the values of a quantitative variable and provide meaningful results. Quartile The percentiles that divide data sets into fourths, or four equal parts. Random variable A numerical measure of the outcome of a probability experiment. Its value is determined by chance. Random variables are denoted using letters such as X. Rank-correlation test A nonparametric procedure that is used to test claims regarding the association between two variables. Range The range, R, of a variable is the difference between the largest data value and the smallest data value. That is,
Rejection region The critical region or rejection region is the set of all values such that the null hypothesis is rejected. Relative frequency The proportion or percent of observations within a category. It is found using the formula
Relative frequency distribution A relative frequency distribution lists the relative frequency of each category of data. Residual The difference between the observed value of y and the predicted value of y is the error or residual. Response variable A quantitative or qualitative variable that represents the variable of interest. The response variable is the variable whose value can be explained by, or is determined by, the value of the predictor variable. Robust When minor departures from normality will not seriously affect results. Row variable The category represented by the rows of a contingency table. Run A sequence of similar events, items, or symbols that is followed by an event, item, or symbol that is mutually exclusive from the first event, item, or symbol. The number of events, items, or symbols in a run is called its length. Runs test for randomness A procedure used to test claims that data have been obtained or occur randomly. Sample A subset of the population. Sample arithmetic mean The sample arithmetic mean, Sample correlation coefficient See Linear correlation coefficient. Sample mean of a variable from a frequency distribution
where Sample space The collection of all possible simple events. Sample standard deviation The sample standard deviation, s, is obtained by taking the square root of the sample variance. That is,
Sample variance The sample variance,
where Sample variance of a variable from a frequency distribution
where Sampling distribution of the mean A probability distribution of all possible values of the random variable Sampling error The error that results from using sampling to estimate information regarding a population. This type of error occurs because a sample gives incomplete information about the population. Scatter diagram A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The predictor variable is plotted on the horizontal axis and the response variable is plotted on the vertical axis. Do not connect the points when drawing a scatter diagram. Simple event Any single outcome from a probability experiment. Each simple event is denoted ei. Simple random sampling A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring. The sample is then called a simple random sample. Skewed left A probability distribution in which the tail to the right of the highest frequency is longer than the tail to the left of the highest frequency. Skewed right A probability distribution in which the tail to the left of the highest frequency is longer than the tail to the right of the highest frequency. Standard error of the mean The standard deviation of the sampling distribution of Statistic A descriptive measure of a sample. Statistics The science of collecting, organizing, summarizing and analyzing information in order to draw conclusions. Stem-and-Leaf plot The stem-and-leaf plot is constructed as follows. Step 1: The stem of the graph will consist of the digits to the left of the rightmost digit. The leaf of the graph will be the rightmost digit. The choice of the stem depends upon the class width desired. Step 2: Write the stems in a vertical column in increasing order. Draw a vertical line to the right of the stems. Step 3: Write each leaf corresponding to the stems to the right of the vertical line. The leafs must be written in ascending order. Stratified sample A stratified sample is obtained by separating the population into nonoverlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each strata should be homogeneous (or similar) in some way. Subject A person, object, or some other well-defined item that is a member of the population being studied and upon which a treatment is applied. (See Experimental unit and Individual.) Subjective probabilities Probabilities obtained based upon an educated guess. Systematic sample A systematic sample is obtained by selecting every kth individual from the population. The first individual selected is a random number between 1 and k. Time series plot A time series plot is obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis. Lines are then drawn connecting the points. T-interval A confidence interval using the t-distribution. Total deviation The deviation between the observed value of the response variable, y, and the mean value of the response variable, Treatment A condition applied to the experimental unit. Trial Each repetition of an experiment. Type I error An incorrect decision in which H0 is rejected when in fact H0 is true. Type II error An incorrect decision in which H0 is not rejected when in fact H1 is true. Unbiased estimator A statistic is an unbiased estimator provided its expected value is equal to the value of the parameter. Unexplained deviation The deviation between the observed value of the response variable, y, and the predicted value of the response variable, Uniform distribution A probability distribution in which the frequency of each value of the variable is evenly spread out across the values of the variable. Unusual event An event that has a low probability of occurring. Univariate data Data in which a single variable was measured for each individual in the study. Upper class limit The largest value within a class. Variables The characteristics of the individuals within the population. Venn diagrams Pictures in which events are represented as circles enclosed in a rectangle. Weighted mean The weighted mean,
where Wilcoxon matched-pairs signed-ranks test A nonparametric procedure that is used to test the equality of two population medians by dependent sampling. Z-interval A confidence interval using Z-scores. Z-score The z-score represents the number of standard deviations that a data value is from the mean. It is obtained by subtracting the mean from the data value and dividing this result by the standard deviation. There are both a population z-score and a sample z-score; their formulas follow:
The z-score is unitless; it has mean 0 and standard deviation 1.
|