Home Chapter 5 MINITAB Project

# MINITAB Project

STATISTICS EXPLORATION # 5
DISCRETE PROBABILITY DISTRIBUTIONS

PURPOSE - to use MINITAB to

• enhance the understanding of discrete random variables and their probability distributions
• explore, through simulations and graphs, the mean (expected value) of a discrete random variable
• explore, through simulations and graphs, the variance (standard deviation) of a discrete random variable
• explore, through simulations and graphs, some theoretical and empirical probability distributions for discrete random variables
• produce probabilities associated with some common discrete probability distributions

BACKGROUND INFORMATION

• A discrete random variable X is a rule which assigns a real value x to each outcome of a probability experiment.
• Note: The x-values are values of the random variable X.
• Note: The convention is to use upper case letters to represent the random variable and the corresponding lower case letters to represent the values of the random variable.
• Example: (Revisited from exploration #4) Consider the selection of two items from a production line. Let D be the event of a defective item and let N be the event of a non-defective item.
• Recall that the sample space was given by

Let X = number of defective items. Then, the possible values of X are 0, 1, and 2. That is, x = 0, 1, or 2.

The relationship between the sample space S and the values of X is shown in the following diagram.

Relationship between the Sample Space and the Values of X

Observe that through the definition of X, the points in the sample space S are associated with only one of the values of X. (These values are points on the number line). Thus, by definition, X is a function or rule. We call X a random variable.

Note: Since the values of X are discrete, we call X a discrete random variable.

• The probability distribution for a discrete random variable is the distribution of the values of the random variable and the associated probabilities.
• In the example above, the probability distribution for the number of defective items will be

 x-values P(x) 0 0.25 1 0.50 2 0.25

• Note: For a discrete probability distribution the following holds --
1. 0 £ P(x) £ 1 and
• The mean or the expected value for the random variable X, denoted by m = E(X) is defined to be
• The expected value for a random variable is the long term average of the variable. It is the average value of the random variable when the experiment is repeated a large number of times. For instance, in the example above, we can repeat the observation of two items and record the number of defectives for each experiment. If we repeat an infinite number of times and find the average of the number of defectives, this value will be the expected value for the number of defectives when we observe two items.
• The variance for a discrete random variable X, denoted by s2 = V(X) is defined to be
• Note: The standard deviation, denoted by s, for the random variable X is obtained by taking the square root of the variance.
• Note: The standard deviation and the variance are measures of the dispersion of the population of x-values generated by X about the mean m.
• Many real life situations (probability experiments) can be modeled by a particular probability experiment. One such experiment is the Binomial experiment.
• A binomial experiment is a probability experiment with the following properties:
• There are two possible outcomes (success or failure) on each trial of the experiment.
• There is a fixed number of trials in the experiment, say n
• The probability of success, say p, is fixed from trial to trial in the experiment.
• The trials in the experiment are independent.
• Binomial experiments occur quite frequently in the real world, and a model has been developed to help compute the probabilities associated with such experiments.
• If we are interested in the number of successes in a binomial experiment, then the associated random variable is called a binomial random variable.
• The function that generates binomial probabilities is given by:
• where n! is read as "n factorial".

Note: n! = n ´ (n-1) ´ (n-2) ´ (n-3) ´´ 3 ´ 2 ´ 1

• Example: What is the value of 5!?
• 5! = 5 ´ 4 ´ 3 ´ 2 ´ 1 = 120

• Example: What is the value of 10!?
• 10! = 10 ´ 9 ´ 8 ´´ 3 ´ 2 ´ 1 = 3,628,800

• Note: We define 0! = 1 and 1! =1.
• Note: The binomial random variable is a discrete random variable since its values are discrete.
• Note: The mean and variance of a binomial random variable are given by m = n ´ p and s2 = n ´ p ´ (1-p) respectively.
• Example: If a student guesses on a 20 question multiple-choice exam, would this experiment constitute a binomial experiment if we were interested in the number of correct guesses. We will assume that each question has four possible responses with only one being correct and the questions are independent of each other.
• This is a binomial experiment since
• There is a fixed number of trials of n = 20.
• There are two possible outcomes on each trial. Since the student is guessing, the student will either guess correctly or guess incorrectly.
• The probability of success, that is the probability of guessing correctly is p = 1/4. This probability will remain fixed from trial (problem) to trial (problem) since there is only one correct response for each question.
• The trials are independent of each other since we are given that the questions are independent of each other.
• Many real life situations (probability experiments) can be modeled by a particular probability process. One such process is the Poisson process.
• A Poisson process or a Poisson experiment is a probability experiment with the following properties:
• The experiment consists of counting the number of successes during an interval of time (or area or volume).
• The probability that a success occurs in a given interval of time (or area or volume) is the same for all the intervals.
• The probability of observing two or more successes in any interval of time (or volume or area) is zero.
• The occurrence of a success in any interval of time is independent of that in any other interval of time (or area or volume).
• The following are some examples which follow a Poisson process:
• The number of telephone calls per hour at a switchboard.
• The number of e-mails received per hour.
• The number of chips per cookie in a box of chocolate-chip cookies.
• The number of patients administered to in a hospital emergency room per day.
• The number of defective items manufactured per 4-hour period in a manufacturing process.
• Etc.
• Each of the preceding examples is representative of a Poisson process or experiment.
• Poisson processes occur quite frequently in the real world, and a model has been developed to help compute the probabilities associated with such processes.
• If we are interested in the number of successes in a Poisson process or experiment, then the associated random variable is called a Poisson random variable.
• The function that generates Poisson probabilities is given by:
• Note: The Poisson random variable is a discrete random variable since its values are discrete.
• Note: The mean and variance of a Poisson random variable are given by m and s2 = m. This implies that the standard deviation for a Poisson random variable is
.

PROCEDURES

First, load the MINITAB (windows version) software as described in Exploration #0.

NOTE: The procedures presented in these explorations may not be the only way to achieve the end results. Also, whenever graphs are presented, only the MINITAB graphics features will be used.

1. DISCRETE PROBABILITY DISTRIBUTIONS

This section will illustrate how MINITAB can be used to generate means and standard deviations (variances) for discrete random variables.

Example 1: Consider a discrete random variable having the following probability distribution table:

 x 1 2 3 4 5 P(x) 0.16 0.26 0.3 0.17 0.11

Recall, in order for us to compute the expected value or the mean for the variable X we can use the formula . Thus, applying the formula, we have

m = 1 ´ 0.16 + 2 ´ 0.26 + 3 ´ 0.30 + 4 ´ 0.17 + 5 ´ 0.11 = 2.81.

If we use MINITAB to compute the expected value, first we have to enter the x-values in column C1, and the P(x) values in column C2. Rename the columns as VALUES and P(x) respectively. Label column C3 as MEAN.

In order to compute the expected value, select Calc® Calculator and the Calculator dialog box will appear. We need to multiply the VALUES with the corresponding P(x) values and then sum all the products. To achieve this, fill in the dialog box as shown in Figure 5.1.

Figure 5.1: Calculator Dialog box with entries to compute the Mean or Expected Value

Select the OK button and the expected value will be computed and saved in column C3 (MEAN). Figure 5.2 shows the data window with the results. Observe that the mean or expected value is 2.81. This was the same value obtained with the formula.

Figure 5.2: Data sheet with the Expected Value

Example 2: Consider the distribution for a discrete random variable given in Example 1. Simulate values from this distribution to illustrate the concept of the mean or expected value for the variable.

Procedure: We will simulate in increments of 500 up to 5000 values from the probability distribution and save in appropriate columns. You may name them as S500, S1000, S1500, etc. We will then compute the means (averages) for these simulated values and plot the means against the number of simulations and observe what is happening to the means as the number of simulations is increasing.

To simulate a sample of size 500 from the distribution, select Calc® Random Data® Discrete and the Discrete Distribution dialog box will be displayed. Fill in as shown in Figure 5.3. Note: The Store in column(s) text box has S500 listed. This was the name of the column where the 500 simulated values will be saved. You can use other labels for your columns for the simulated values if you wish.

Figure 5.3: Discrete Distribution dialog box with entries to simulate 500 values

Click on the OK button and the 500 values will be simulated from the distribution and saved in column (renamed) S500.

Repeat the simulation for increments of 500 up to 5000. That is, the next set of simulation will be a sample of size 1000, the next 1500, the next 2000, etc. until 5000.

Figure 5.4 shows a portion of the output of the descriptive statistics for a set of simulations. In particular, observe the means. All of these values are approximately equal to 2.8. This is very close to the expected value of 2.81.

Note: Your simulations should display different results.

Figure 5.4: Partial Descriptive Statistics output for a set of simulated values

Next we use the edit and copy feature of the software to copy the information shown in Figure 5.4 and paste it in a worksheet. The purpose of this is to compute descriptive statistics for the means. The descriptive statistics output for the means are shown in Figure 5.5.

Figure 5.5: Descriptive Statistics for the simulated means

Observe that in this case, the mean of the means is 2.7964 » 2.8 (this is after a total of 500 + 1000 + 5000 + … = 27,500 simulations). This value is close to 2.81. If we had simulated more values and compute the mean of the means the value should be closer to 2.81. This is so because the expected value of a variable is a long term average (mean). That is, it is the mean of the distribution of values and hence it is the mean of the population of these values. Thus, the larger the sample taken from the distribution (population) of values, the closer the sample mean will be to the expected value or the population mean.

Note: This is an application of the Law of Large Numbers.

Figure 5.6 shows a plot of the simulated means and the expected value of 2.81. Observe that most of the values are close to a value of 2.81.

Figure 5.6: Plot of the simulated means along with the Expected Value of 2.81

Example 3: Consider a discrete random variable having the following probability distribution table:

 x 1 2 3 4 5 P(x) 0.16 0.26 0.3 0.17 0.11

In order for us to compute the variance for the variable X we can use the formula . Thus, applying the formula, we have

s2 = (1 - 2.81)2´ 0.16 + (2 – 2.81)2 ´ 0.26 + (3 – 2.81)2 ´ 0.30 + (4 – 2.81)2´ 0.17 + (5 – 2.81)2´0.11

= 1.4739.

Recall, if we take the square root of the variance we will obtain the standard deviation s . Thus, s = 1.2140.

If we use MINITAB to compute the variance, first we have to enter x-values in column C1, and the P(x) values in column C2. Rename the columns as VALUES and P(x) respectively. Label column C3 as VARIANCE.

In order to compute the variance, select Calc® Calculator and the Calculator dialog box will appear. We need to subtract the mean from each value, then square the results, then multiply these square values with the corresponding P(x) values and then sum all the products. To achieve this, fill in the dialog box as shown in Figure 5.7.

Figure 5.7: Calculator Dialog box with entries to compute the Variance

Select the OK button and the variance will be computed and saved in column C3 (VARIANCE). Figure 5.8 shows the data window with the results. Observe that the value of the variance is 1.4739. This was the same value obtained with the formula.

Figure 5.8: Data sheet with the Variance

To determine the standard deviation, all you need to do in the Calculator dialog box is to take the square root of the variance. Figure 5.9 shows the dialog box with the appropriate entries. Note that the standard deviation will be saved in column C4 (STDEV).

Figure 5.9: Calculator Dialog box with entries to compute the Standard Deviation

Click on the OK button and the value of 1.2140 for the standard deviation will be displayed in column C4. Figure 5.10 shows the values of the variance and standard deviation for the probability distributions.

Figure 5.10: Display with the variance and standard deviation

Example 4: Consider the distribution for a discrete random variable given in Example 1. Simulate values from this distribution to illustrate the concept of the standard deviation for the variable.

Note: By analyzing the standard deviation you are indirectly analyzing the variance since the variance is the square of the standard deviation.

Procedure: We will simulate in increments of 500 up to 5000 values from the probability distribution and save in appropriate columns. You may name them as S500, S1000, S1500, etc. We will then compute the standard deviations for these simulated values and plot the standard deviations against the number of simulations and observe what is happening to the means as the number of simulations is increasing.

To simulate the samples from the distribution, repeat as in Example 2. Note: The author is using the information displayed in Figure 5.4 for the already simulated data from Example 2.

Figure 5.11 shows a portion of the output of the descriptive statistics for a set of simulations. In particular, observe the standard deviations (StDev). All of these values are approximately equal to 1.2. This is very close to the standard deviation value of 1.21 (to two decimal places).

Figure 5.11: Partial Descriptive Statistics output for a set of simulated values for Example 2

Next we use the edit and copy feature of the software to copy the information shown in Figure 5.11 and paste it in a worksheet as we did in Example 2. The purpose of this is to compute descriptive statistics for the standard deviations. The descriptive statistics output for the standard deviations is shown in Figure 5.12.

Figure 5.12: Descriptive Statistics for the simulated Standard Deviations

Observe that in this case, the mean of the standard deviations is 1.2139. This value is very close to 1.2140. If we had simulated more values and compute the mean of the standard deviations the value will get closer to 1.2140.

Figure 5.13 shows a plot of the simulated standard deviations and the distribution standard deviation of 1.2140. Observe that most of the values are close to a value of 1.2140.

Figure 5.13: Plot of the simulated standard deviations along with the Standard Deviation value of 1.2140 for the distribution

2. THE BINOMIAL DISTRIBUTION

Recall, a binomial experiment is a probability experiment with the following properties:

• There are two possible outcomes (success or failure) on each trial of the experiment.
• There is a fixed number of trials in the experiment, say n
• The probability of success, say p, is fixed from trial to trial in the experiment.
• The trials in the experiment are independent.

Binomial experiments occur quite frequently in the real world, and a model has been developed to help compute the probabilities associated with such experiments. If we are interested in the number of successes in a binomial experiment, then the associated random variable is called a binomial random variable.

If the random variable X is a binomial random variable, then the distribution of X can be generated in MINITAB by selecting Calc® Probability Distribution® Binomial.

Example 5: A recent report concluded that 34% of adults are overweight (over the ideal weight). A random sample of 25 adults is tested to determine whether they are overweight or not. We will let X represent the number of adults in the sample who are overweight.

1. Can we consider this situation to be a binomial experiment?

In order for this situation to be a binomial experiment, we need to determine whether the situation satisfies the four conditions of a binomial experiment.

• Since an adult can either be classified as overweight or not, then the condition of two possible outcomes on each trial is satisfied.
• Since we have a sample of size 25, then we will test a fixed number of adults to determine whether each is overweight or not. That is, we have a fixed number of trials n = 25.
• Since we are given that 34% (0.34) of adults are overweight, then each adult in the sample has a probability p = 0.34 of being overweight. That is, the probability of a successful outcome of being overweight is fixed from adult (trial) to adult (trial).
• Since each adult will be overweight or not independently of each other, the condition of independence is satisfied.

Since the four conditions for a binomial experiment are satisfied, then we can classify this situation as a binomial experiment and the number of successes X (number of adults who are overweight), as a binomial random variable.

1. Use MINITAB to construct a probability distribution for this experiment.

In order to use MINITAB to generate a probability distribution for this situation, first we need to enter the possible number of successes for this experiment. Note that the possible number of successes will range from 0 to 25. Rename column C1 as x-values and column C2 as P(x-values). You can manually enter the values 0, 1, 2, 3, …, 25 in C1 or you can generate them. To generate the values, select Calc® Make Patterned Data® Simple Set of Numbers and the Simple Set of Numbers dialog box will appear. Fill in the boxes as shown in Figure 5.14. Click on the OK button and the values 0 to 25 will be generated in the x-values (C1) column.

Figure 5.14: Simple Set of Numbers dialog box with entries

Next, in order to compute the binomial probabilities for the generated number of successes, select Calc® Probability Distributions® Binomial. Fill in the Binomial Distribution dialog box as shown in Figure 5.15. Observe that the Probability option is selected as well as the Input column option. Click on the OK button and the corresponding probabilities will be generated in the P(x-values) column or column C2.

Figure 5.15: Binomial Probability Dialog Box with entries

Figure 5.16 shows a partial output for the probabilities. For example, from Figure 5.16, P(X = 0) = 0.000031, P(X = 5) = 0.059376 etc.

Figure 5.16: Partial Output for the Binomial Probabilities

1. Use MINITAB to graphically display the probability distribution by using a projection graph.

To present a projection graph for the probability distribution (with the x-values in column C1 and the P(x-values) in column C2), select Graph® Plot and fill in the options in the Plot dialog box as shown in Figure 5.17.

Figure 5.17: Plot Options for the Distribution Projection Graph

Click on the OK button and the projection graph will be generated. This is displayed in Figure 5.18.

Observations from Figure 5.18:

• The graph is very slightly skewed to the right. For most practical purposes, one can assume that it is symmetrical.
• The majority of the probability for the distribution lies between x-values of 4 and 14.
• The largest probability for an observed x-value is when x = 8.
• The mean number of overweight adults for this distribution is m = n ´ p = 25 ´ 0.34 = 8.5.
• The standard deviation for the number of overweight adults for this distribution is = 2.37 (to two decimal places).
• The interval for two standard deviations from the mean is [8.5 – 2 ´ 2.37, 8.5 + 2 ´ 2.37] = [3.76, 13.24].
• Since the distribution may be assumed to be symmetrical, then approximately 95% of the x-values will lie within two standard deviations of the mean. That is, approximately 95% of the x-values will lie in the interval [3.76, 13.24].
• From Figure 5.16, the actual proportion of values between [3.76 » 4, 13.24 » 13] is 96.78%.

Figure 5.18: Projection graph for the Generated Binomial Probabilities

3. THE POISSON DISTRIBUTION

Recall, a Poisson experiment is a probability experiment with the following properties:

• The experiment consists of counting the number of successes during an interval of time (or area or volume).
• The probability that a success occurs in a given interval of time (or area or volume) is the same for all the intervals.
• The probability of observing two or more successes in any interval (or volume or area) is zero.
• The occurrence of a success in any interval is independent of that in any other interval of time (or area or volume).

Poisson experiments occur quite frequently in the real world, and a model has been developed to help compute the probabilities associated with such experiments. If we are interested in the number of successes in a Poisson experiment, then the associated random variable is called a Poisson random variable.

If the random variable X is a Poisson random variable, then the distribution of X can be generated in MINITAB by selecting Calc® Probability Distribution® Poisson.

Example 6: A Poisson distribution is used to model the number of trucks arriving at a weigh station per hour. Assume the weigh station is on an interstate where the traffic is free flowing and that the average number of trucks arriving per hour at the weigh station is 6.

1. Use MINITAB to construct a probability distribution for this situation. Illustrate for X = 0 to X = 20.

Note: Here we will assume that the process of observing arriving trucks at the weigh station to be a Poisson process.

Recall: The possible number of successes for a Poisson process will range from 0 to ¥ (infinity).

In order to use MINITAB to generate a probability distribution for this situation, first we need to enter the possible number of successes for this experiment. Recall the possible number of successes for a Poisson process will range from 0 to ¥ (infinity). However, we are only considering the number of success from 0 to 20. Rename column C1 as x-values and column C2 as P(x-values). You can manually enter the values 0, 1, 2, 3, …, 20 in C1 or you can generate them. To generate the values, select Calc® Make Patterned Data® Simple Set of Numbers and the Simple Set of Numbers dialog box will appear. Fill in the appropriate numbers in the text boxes for a similar dialog box as shown in Figure 5.14. Click on the OK button and the values 0 to 20 will be generated in the x-values (C1) column. Recall Figure 5.14 relates to the binomial example.

Next, in order to compute the Poisson probabilities for the generated number of successes, select Calc® Probability Distributions® Poisson. Fill in the Poisson Distribution dialog box as shown in Figure 5.19. Observe that the Probability option is selected as well as the Input column option. Click on the OK button and the corresponding probabilities will be generated in the P(x-values) column or column C2.

Figure 5.19: Poisson Probability Dialog Box with entries

Figure 5.20 shows a partial output for the probabilities. For example, from Figure 5.20, P(X = 0) = 0.002479, P(X = 12) = 0.011264 etc.

Figure 5.20: Partial Output for the Poisson Probabilities

1. Use MINITAB to graphically display the probability distribution by using a projection graph.

Follow the procedure in the previous example to present the projection graph. The graph is shown in Figure 5.21.

Figure 5.21: Projection Graph for the Generated Poisson Probabilities

Observations from Figure 5.21:

• The graph is skewed to the right.
• The majority of the probability for the distribution lies between x-values of 1 and 14.
• The largest probability for an observed x-value is when x = 5 and x = 6. Both probabilities are equal to 0.160623. See Figure 5.20.
• The mean number of trucks that pull into the weigh station is m = 6. Observe this is where one of the highest probability occurs.
• The standard deviation for the number of trucks that pull into the weigh station for this Poisson distribution is = 2.45 (to two decimal places).
• The interval for two standard deviations from the mean is [6 – 2 ´ 2.45, 6 + 2 ´ 2.45] = [1.1, 10.9].
• Since the distribution is very roughly symmetric, then approximately 95% of the x-values will lie within two standard deviations of the mean. That is, approximately 95% of the x-values will lie in the interval [1.1, 10.9].
• From Figure 5.21, the actual proportion of values between [1.1 » 1, 10.9 » 11] is 97.74%.

Figure 5.18: Projection graph for the Binomial Distribution

4. SIMULATING EXPERIMENTS INVOLVING DISCRETE DISTRIBUTION

In this section we will simulate from the binomial distribution and compare the theoretical probabilities with the empirical probabilities obtained through the simulation. We can, of course, replicate similar experiments for any discrete distribution.

Recall, MINITAB can be used to generate random data by selecting Calc® Random Data. This selection can be used with any of the distributions listed in the MINITAB options (e.g. Poisson, Integer, etc.).

Example 7: The owner of a pawnshop knows from past data that 80% of the customers who enter her shop will pawn an item. Currently, there are 15 customers in the shop. Use this information to compare the theoretical and empirical probabilities for the appropriate distribution.

NOTE:

• We will let X represents the number of customers who will pawn an item. Here X is a binomial random variable since:
• Each customer will either pawn or will not pawn an item. Thus we have two possible outcomes for each trial (customer).
• There is a fixed number of customers. In this case n = 15.
• The probability that a customer will pawn an item is p = 0.80.
• We will assume that each customer is pawning an item independently of the next customer.

First we will compute the exact binomial probabilities for X = 0, 1, 2, …, 15, then we will simulate from the binomial distribution to obtain corresponding empirical probabilities for different simulations. We will then compare the theoretical with the empirical probabilities through tables and graphs.

Place headings of x-value and P(x-value) in columns C1 and C2 respectively. Next, enter the values 0, 1, 2, …, 15 in column C1 by using Calc® Make Patterned Data® Simple Set of Numbers.

Next select using Calc® Probability Distribution® Binomial and the Binomial Distribution dialog box will appear. Fill in the information in the dialog box as is shown in Figure 5.19.

Figure 5.19: Binomial Dialog Box with Appropriate Entries

When the OK button is selected, the binomial probabilities will be generated and saved in column C2.

The projection graph for this distribution is displayed in Figure 5.20.

Figure 5.20: Projection Graph for the Pawnshop Binomial Distribution

Note:

• This binomial distribution has a mean m = n ´ p = 15 ´ 0.8 = 12 and a standard deviation = = 1.55 (to two decimal places).
• Observe that the distribution is skewed to the left.
• Observe that the peak of the distribution is at x = 12. This is the value of the mean.

We will next simulate 100, 200, 300, 400, 500, 600, 700, 800, 900, and 1000 values from a binomial distribution with mean m = 12 and standard deviation s = 1.55.

Label column C3 as Sim100, column C4 as Sim200, column c5 as Sim300, and so on. We will save the simulated values in the respective columns.

To simulate the data, select Calc® Random Data® Binomial and the Binomial Distribution dialog box will appear. Figure 5.21 shows the entries in order to simulate the first 100 values.

Figure 5.21: (Pawnshop) Binomial Distribution dialog box with appropriate entries to simulate 100 values from the distribution

Click on the OK button, and the values will be simulated and saved in the column named Sim100.

Figure 5.22 shows the projection graph for the simulated values. Included on the graph are the frequency counts for each of the simulated value. For instance, the value of 9 (number of customers who will pawn an item) occurred 6 times, the value of 12 occurred 34 times, etc.

Comparing Figures 5.20 and 5.22, the graph in Figure 5.22 is roughly similar in shape to that in Figure 5.20. The difference is due to the randomness of the selection procedure in the software.

Figure 5.22: Projection graph for the 100 Simulated values

Figure 5.23 shows the summary statistics for the 100 simulated values. Observe that the value of the mean is 11.94 » 12 (the actual mean for the distribution) and the value of the standard deviation is 1.543 » 1.55 (the actual standard deviation correct to two decimal places).

Figure 5.23: Descriptive Statistics for the 100 Simulated values

Actual Probabilities versus the Empirical Probabilities

To compute the empirical probabilities for the simulated distribution, consider Figure 5.22. Recall, the graph displays the number of times a value was simulated. So, in order for us to compute the empirical probabilities for each simulated value, all we need to do is to divide the frequency count by 100. For example, to compute the corresponding empirical probability for the simulated value of 7, all we need to do is to divide its corresponding frequency count of 1 by 100. Another example, suppose we want to compute the empirical probability for the simulated value of 11, we will divide the frequency count value of 16 by 100.

Note: We are dividing the frequency count values by 100 since we replicated the experiment 100 times.

The following table displays the theoretical (actual) and empirical (simulated) probabilities.

 x-values 0 1 2 3 4 5 6 7 Theoretical P(x-value) 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.003 Empirical P(x-value) 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.01 x-values 8 9 10 11 12 13 14 15 Theoretical P(x-value) 0.014 0.043 0.103 0.188 0.250 0.231 0.132 0.035 Empirical P(x-value) 0.03 0.06 0.03 0.160 0.340 0.240 0.120 0.01

Note: There is a slight difference (error) between the empirical and theoretical (actual) probabilities.

Note: This experiment was replicated for the remaining simulations of 200, 300, …, 1000.

The projection graph for the 1000 simulation is shown in Figure 5.24. When this graph is compared with that in Figure 5.20, we can see they have similar shapes. Except in

Figure 5.24, there are no probabilities for values 0 through 6. Essentially, the corresponding probabilities for the values 0 through 6 are all equal to zero. The shapes are similar because we have simulated a larger sample from the binomial population, hence Figure 5.24 will be more representative of the distribution.

Figure 5.22: Projection graph for the 1000 Simulated values

The following table shows the theoretical probability with the corresponding empirical probabilities for all the simulations.

Note: When you replicate the simulation, your numbers will not match because of the random nature of the simulation.

 x- values Theo. Emp. 100 Emp. 200 Emp. 300 Emp. 400 Emp. 500 Emp. 600 Emp. 700 Emp. 800 Emp. 900 Emp. 1000 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 3 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 4 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 5 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 6 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.003 0.000 0.000 7 0.003 0.010 0.000 0.003 0.005 0.000 0.000 0.006 0.004 0.003 0.005 8 0.014 0.030 0.015 0.023 0.015 0.012 0.020 0.001 0.011 0.014 0.013 9 0.043 0.060 0.040 0.006 0.038 0.052 0.025 0.041 0.044 0.051 0.043 10 0.103 0.030 0.105 0.097 0.103 0.102 0.093 0.106 0.094 0.103 0.094 11 0.188 0.160 0.140 0.163 0.178 0.190 0.188 0.217 0.165 0.186 0.180 12 0.250 0.340 0.265 0.257 0.240 0.232 0.297 0.220 0.269 0.253 0.254 13 0.231 0.240 0.255 0.200 0.240 0.244 0.220 0.243 0.240 0.219 0.260 14 0.132 0.120 0.140 0.173 0.140 0.132 0.115 0.110 0.146 0.139 0.123 15 0.035 0.010 0.040 0.023 0.043 0.036 0.042 0.047 0.025 0.031 0.028

It is or may be difficult to compare the values as they appear in the table. Let us try to visually observe the differences between the theoretical and empirical probabilities. In order for us to do this, we will plot a few of these probabilities and display them on the same graph.

• Figure 5.23 shows the plot for the theoretical probabilities and the empirical probabilities for the 100 simulated values.
• Observe there are large differences between the actual (theoretical) and the simulated (empirical) probabilities.
• This may be explained by the fact that the sample size was small and thus not a very good representative of the population of values for the number of customers who will pawn an item in the pawnshop.

Figure 5.23: Theoretical Probabilities superimposed on the Empirical probabilities for the 100 simulated values.

• Figure 5.24 shows the plot for the theoretical probabilities and the empirical probabilities for the 1000 simulated values.
• Observe that the differences between the actual (theoretical) and the simulated (empirical) probabilities are small
• This may be explained by the fact that the sample size was large and thus a very good representative of the population of values for the number of customers who will pawn an item in the pawnshop.

Figure 5.24: Theoretical Probabilities superimposed on the Empirical probabilities for the 1000 simulated values.

The following table gives the number of customers who will pawn an item in the pawnshop, the empirical probabilities, the theoretical probabilities and the absolute (positive) errors (the absolute difference between the theoretical and the empirical probabilities) for the 1000 simulated values.

Note: The table only include the number of customers from 7 through 15 because probabilities associated with the rest are zero (to three decimal places).

 No. of Customers 7 8 9 10 11 12 13 14 15 Theo. Prob. 0.003 0.014 0.043 0.103 0.188 0.25 0.231 0.132 0.035 Emp. Prob. 0.005 0.013 0.043 0.094 0.18 0.254 0.26 0.123 0.028 Absolute Error 0.002 0.001 0 0.009 0.008 0.004 0.029 0.009 0.007

Observe that the absolute errors are very small. That is, the empirical probabilities are very close to the theoretical probabilities. In addition, this is another example illustrating the Law of Large Numbers. More than likely, these errors would have improved, that is become smaller, if the experiment was done for a larger number of simulations.

NOTES

EXPLORATION #5: HOMEWORK ASSIGNMENT

Name: _____________________ Date: ______________________

Course #: ___________________ Instructor: _________________

1. Consider a discrete random variable X which has the following discrete probability distribution,
2.  x-value 1 2 3 4 5 6 P(X = x) 0.09 0.095 0.11 0.156 0.234 0.315

1. Using the procedures in Example 1, compute the theoretical mean, variance, and standard deviation for the distribution.
2. NOTE: Enter the x-values in column C1, the probabilities in column C2. Use MINITAB to produce the mean in C3, the variance in C4, and the standard deviation in C5. You may change the column names if you wish - rename C1 as x-values, C2 as PROB, C3 as MEAN, C4 as VARIANCE, and C5 as STD.

Theoretical Mean: ________________________

Theoretical Variance: _________________________

Theoretical Standard Deviation: _________________________

3. Construct a projection graph for this probability distribution. You may title your graph as PROJECTION GRAPH - PROBLEM 1(b).
4. Provide a hard copy of the graph constructed in part (b).
5. How would you describe the shape of the distribution (symmetric, skewed right, skewed left)? Discuss.
6. Use MINITAB to generate random samples up to size 10,000 in increments of size 500 starting at size 500. That is, you will generate samples of size 500, 1000, 1500, 2000, 2500 etc. Follow the procedure as given in Example 2. Compute the means and standard deviations for these simulated values from the discrete distribution. NOTE: These sample means and sample standard deviations are called empirical values since they are obtained from simulated (sampled) values.

Fill in the following table with your computed values.

 Sample size, n Empirical Sample mean, Empirical Sample Standard deviation, S 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 10000

7. From the table of computed values, discuss what you observe for the sample means as the sample size is increasing. Compare with the theoretical mean from part (a).
8. From the table of computed values, discuss what you observe for the sample standard deviation as the sample size is increasing. Compare with the theoretical standard deviation from part (a).
9. Construct projection graphs for the 500, 1000, 1500 and 2000 simulated values. Label graphs appropriately to differentiate from the 500, 1000, 1500, and 2000 simulated values. For example, the title for the graph created with the 500 simulated values could be PROJECTION GRAPH FOR THE 500 SIMULATED VALUES – PROBLEM 1(h).

NOTE: You can simply construct histograms for the simulated values but select project for the Display option to created the projection graphs.
10. Provide hard copies of the graphs.
11. Discuss any observations for these graphs. Observe what happens as the sample size gets large. Compare with the projection graph in part (c).
1. The quality control manager at a manufacturing plant selects 100 items from the production line to check for defective items. It is known from past quality checks that the process produces an average of 5% defectives.
1. Let X = number of defective items in the sample of size 100. Explain why X can be considered as a binomial random variable. NOTE: You need to discuss how the process satisfies the four conditions for it to be a binomial experiment.
2. Explain in detail.

3. Note, since X represents the number of successes, then the possible values for X are 0, 1, 2, 3, … , 100.
• Enter these values in column C1 by using Calc® Make Patterned Data® Simple set of numbers. Enter the appropriate numbers in the text boxes in the Simple Set of Numbers dialog box to generate the values.
• Use the information for the sample size (100) and the probability of a defective item of 5% (0.05) to generate corresponding probabilities and save in column C2. To generate the probabilities for the values in C1, select Calc® Probability Distributions® Binomial.

NOTE: Make sure you select the Probability, the Input column (C1), and the Optional Storage (C2) options in the Binomial Distribution dialog box. Also, note that the Number of trials will be 100 and the Probability of success will be 0.05.

• Compute Cumulative probabilities and save in column C3. Just repeat the process except now you will select the Cumulative probability option in the Binomial distribution dialog box.

Use the information in column C1 – C3 to help find these probabilities.

 P(X = 25) = P(X £ 35) = P(X ³ 60) = P(X < 76) = P(X > 55) = P(30 £ X £ 75) =

NOTE: P(30 £ X £ 75) = P(X £ 75) – P(X £ 29) since 30 must be included in the total probability.

1. Let X be the number of e-mails, received per hour, by an on-line business. Assume that X is a Poisson random variable with a mean of 16.
2. Recall that the standard deviation for a Poisson random variable is obtained by taking the square root of the mean. That is, .

1. Use MINITAB to help find the following probabilities.
2.  P(X = 25) = P(X £ 15) = P(X ³ 16) = P(X < 16) = P(X > 15) = P(10 £ X £ 15) =

3. Simulate the number of e-mail received by the company for the next 10, 50, 100, 500, 1000, 5000, 10,000, 15,000, 20,000, 30,000 hours. That is, generate 10, 50, 500, etc. random data from a Poisson distribution with mean of 16. You may save these generated data in columns C1 through C10.

4. Compute descriptive statistics for these simulated values and enter in the table below.

 Sample size, n Empirical Sample mean, Empirical Sample Standard deviation, S 10 50 100 500 1000 5000 10,000 15,000 20,000 30,000
5. From the table of computed values, discuss what you observe for the sample means as the sample size is increasing. Compare with the theoretical mean of 16 from part (a).
6. From the table of computed values, discuss what you observe for the sample standard deviations as the sample size is increasing. Compare with the theoretical standard deviation of 4 from part (a).
7. Construct projection graphs for the simulated values in part (b). Provide hard copies of these graphs. Discuss any observations from the graphs. In particular, discuss your observations from the graphs as the sample size is increasing.
1. This exploration will allow you to investigate some of the properties of the binomial distribution through simulations.
1. Here we will simulate binomial values for a fixed number of trials but with varying probabilities. Each simulation will be done 500 times. That is, you will simulate 500 rows of the number of successes for the binomial situation. So, for n = 10, and p = 0.01, 0.05, 0.1, 0.3, 0.5, 0.7, 0.9, simulate 500 values for each n and p combinations and save in columns C1 through C7.

Generate projection graphs for these simulated values. Provide titles that reflect which n and p combinations are used. For example, for the values in C1, you may title the graph as BINOMIAL PROJECTION GRAPH WITH n = 10 AND p = 0.01.

Discuss your observations from these graphs. In particular, what are your observations for the binomial distribution when n is small and p varies from a small value to a large value.

2. Here we will simulate binomial values for varying number of trials but with a fixed probability. Each simulation will be done 500 times. That is, you will simulate 500 rows of the number of successes for the binomial situation. So, for n = 5, 10, 20, 30, 50, 100, 200 and p = 0.05, simulate 500 values for each n and p combinations and save in columns C1 through C7.

Generate projection graphs for these simulated values. Provide titles that reflect which n and p combinations are used.

Discuss your observations from these graphs. In particular, what are your observations for the binomial distribution when n is varying from a small value to a large value and p is small (0.05 in this case).

3. Here we will simulate binomial values for varying number of trials but with a fixed probability. Each simulation will be done 500 times. That is, you will simulate 500 rows of the number of successes for the binomial situation. So, for n = 5, 10, 20, 30, 50, 100, 200 and p = 0.5, simulate 500 values for each n and p combinations and save in columns C1 through C7.
4. Generate projection graphs for these simulated values. Provide titles that reflect which n and p combinations are used.

Discuss your observations from these graphs. In particular, what are your observations for the binomial distribution when n is varying from a small value to a large value and p is 0.5.

5. Here we will simulate binomial values for varying number of trials but with a fixed probability. Each simulation will be done 500 times. That is, you will simulate 500 rows of the number of successes for the binomial situation. So, for n = 5, 10, 20, 30, 50, 100, 200 and p = 0.95, simulate 500 values for each n and p combinations and save in columns C1 through C7.

Generate projection graphs for these simulated values. Provide titles that reflect which n and p combinations are used.

Discuss your observations from these graphs. In particular, what are your observations for the binomial distribution when n is varying from a small value to a large value and p is large (close to 1).

 Copyright © 1995 - 2010 Pearson Education . All rights reserved. Pearson Prentice Hall is an imprint of Pearson . Legal Notice | Privacy Policy | Permissions