evidence data analysis lesson

Unit Three: Evidence: Data Analysis

On-line Lesson

3.0 Introduction to Data Analysis

Project design includes determining the most appropriate type of data analysis. Data analysis should begin as soon as the data is collected. Selecting the appropriate statistics for your study is based on understanding a few basic principles. The correct statistics allow the evaluator to describe the data and make comparisons between the groups in the study.

3.1 Data According to Measurement

Two kinds of numbers exist:

Continuous: are data that can be measured along a continuum and can take on endless possibilities with intermediate values (examples: weight and time).

Discrete: data are noncontinuous and finite. There are no in-between numbers (example the number of hikers on a trail were 89, not 89.5).

Levels of Data

Scales or Levels of Measurement

The scales of measurement are listed below from the least to the most matching with the real number system.

	Nominal
	Ordinal
	Interval
	Ratio

Numbers mean different things in different situations. Numbers are assigned to objects according to rules. You need to distinguish clearly between the thing you are interested in and the number that symbolizes or stands for the thing. For example, you have had lots of experience with the numbers 2 and 4. You can state immediately that 4 is twice as much as 2. That statement is correct if you are dealing with numbers themselves, but it may or may not be true when those numbers are symbols for things.

The statement is true if the numbers refer to apples; four apples are twice as many as two apples. However, the statement is not true if the numbers refer to the order that runners finish in a race. Fourth place is not twice anything in relation to second place-not twice as slow or twice as far behind the first-place runner. The point is that the numbers 2 and 4 are used to refer to both apples and finish places in a race, but the numbers mean different things in those two situations.

Nominal Level

The nominal scale, is not a scale at all in the usual sense. In the nominal scale, numbers are used simply as names and have no real quantitative value. It is the scale used for qualitative variables. Numerals on sports uniforms are an example; here, 45 is different from 32, but that is about all we can say. The person represented by 45 is not "more than" the person represented by 32, and certainly it would be meaningless to try to add 45 and 32.

Designating different colors, different sexes, or different political parties by numbers will produce nominal scales. With a nominal scale, you can even reassign the numbers and still maintain the original meaning, which is only that things with different numbers are different. Of course, all things that are alike have the same number.

Ordinal Level

The ordinal scale, has the characteristic of the nominal scale (different numbers mean different things) plus the characteristic of indicating "greater than" or "less than." In the ordinal scale, the object with the number 3 has less or more of something than the object with the number 5. Finish places in a race are an example of an ordinal scale. The runners finish in rank order, with "1" assigned to the winner, "2" to the runner-up, and so on. Here, 1 means less time than 2.

Interval Level

The interval scale has properties of both the ordinal and nominal scales, plus the additional property that intervals between the numbers are equal. "Equal interval" means that the distance between the things represented by "2" and "3" is the same as the distance between the things represented by "3" and "4."

The centigrade thermometer is based on an interval scale. The difference in temperature between 10° and 20° is the same as the difference between 40° and 50°. The centigrade thermometer, like all interval scales, has an arbitrary zero point.

With interval data, we have one restriction: we may not make simple ratio statements. We may not say that 100° is twice as hot as 50° or that a person with an IQ of 60 is half as intelligent as a person with an IQ of 120. In park, recreation and leisure services, the most common form of interval measurement is a Likert Scale.

Ratio Level

The ratio scale has all the characteristics of the nominal, ordinal, and interval scales, plus one: it has a true zero point, which indicates a complete absence of the thing measured. On a ratio scale, zero means "none." Height, weight, and time are measured with ratio scales. Zero height, zero weight, and zero time mean that no amount of these variables is present. With a true zero point, you can make ratio statements like "16 kilograms is four times heavier than 4 kilograms."

Describing Variables

In order to conduct any research study, some form of measurement must be used. The most common forms of measurement used in park recreation and leisure services research are explained below.

Variable: Any characteristic that can take have more than one form or value. Examples of a variable would be gender and height. Gender has two groups or variables, height has numerous groups or variables.

Measurement: The assignment of numbers on the basis of variation.

Score: Means a number.

Three Types of Variables

	Independent
	Dependent
	Control

Independent Variables

Independent variables are also described as predictor, input, manipulated, treatment, stimulus, intervention, experimental, or moderating variables. The independent variable is presumed to cause, affect, or influence the outcome measures of research study.

The research method creates a situation where subjects are exposed to a condition or problem, which is called the independent measure. The subject's response to the condition or problem is the dependent variable.

An example is a study to determine whether age, sex, and occupation (independent variables) positively or negatively affect attitudes toward leisure time (dependent variable).

Dependent Variables

Dependent variables represent the effect or influence of the independent variable. They are sometimes referred to as outcome, output, or response variables. They are "dependent" in that the outcome depends on the effects of the variables being manipulated.

Control Variables

Control variables are also called background or classification variables because they need to be controlled, held constant, or randomized so that their effects are neutralized or accounted for during the study.

Extraneous variables can be controlled or ruled out using the following processes:

Random assignment of subjects,
The ability to manipulate the instrument or test,
The time when the measurements of the dependent variable occur ( randomized data collection times are necessary), and
Which groups are measured.

Characteristics of Leisure and Recreation Surveys

Standard Independent variables: gender, age, income, ethnicity, household characteristics, and place of residence.

Primary Dependent variables: attitude, interests, and questions relating to park, recreation and leisure services issues.

Three Types of Analysis

	Univariate analysis is the examination of the distribution of cases on only one variable (including mean, median, and mode)
	Bivariate is the simultaneous evaluation of two variables, one is independent and one is dependent.
	Multivariate analysis uses two or more independent variables and one dependent variable.

3.2 Getting Your Data Together: Organizing and Coding Quantitative Data

Coding involves organizing the variables used in a measurement instrument to collect data. Coding is necessary for computer analysis.

Values are numbers assigned to data from a variable.

Coding systems are arbitrary so a codebook is created by the evaluator to identify the variables and their associated code. Codebooks that include the coding procedures and be created at any stage in the evaluation project. A codebook will include the variable name, the value labels for each variable, and the corresponding question from the questionnaire. See table 3.2 (2) page 268 in the text.

Data coding, uses either:

	precoding, building the numbers corresponding to responses into the questionnaire, or
	postcoding, assigning numbers to response categories at the time of data entry, is essential to the process of data analysis. The specific number associated with each response often depends on the type of scale being used in the questionnaire.

Other Coding Issues

Almost every data collector will have situations where some data are missing or where the respondent did not understand the directions and ended up wit inappropriate responses. Maybe the respondent did not want to answer a certain question. Rather than throw out the entire questionnaire because some information is missing or incorrectly indicated, you can devise a system for handling missing data.

You may select a particular code to represent missing information.

3.3 Univariate Statistical Analyses: Describing What Is

The Basics of Descriptive Analysis

Descriptive statistics describe and summarize the characteristics of your data. Any form of data may provide descriptive information.

Frequency Counts and Percentages

Frequency is the total of times a score occurs in a distribution (item).

Percentage is the number of responses in a specific item divided by the total respondents for a particular question.
Example: 22 redheads - 51 brunettes - 15 blondes = (N=88) so, 22 redheads divided by 88 = 25%

Central Tendency, gives one score or measure that represents an entire group.

Mean: the sum of the scores divided by the number of scores. Also known as the average.
Median: the point that divides a distribution of scores exactly in half.
Mode: the most frequently occurring score (the score with the greatest frequency).

Variations in Data Characteristics

Range, is one of the easiest calculations of dissimilarity or dispersion. Range is the difference between the maximum and minimum observed values.

Variance, is a dispersion measure of variation that is based on all observations and describes the extent to which the score differ from each other.

Standard Deviation, is the average distance of each score from the mean. The more the scores cluster around the mean, the more the evaluator can conclude that everyone is performing about the same level or answering questions similarly on a questionnaire.

The Distribution Curve (What Do My Data Look Like?)

Normal Distribution (Bell Curve) with a representative sample variables will cluster around the middle of a distribution and the frequency of variables will decrease as you move away from the central concentration. The Normal Distribution is the most important theoretical distribution in statistics. In a normal distribution 68% of the observations fall within 1 standard deviation of the mean and 95% of all observations fall within 2 standard deviations of the mean. See 3.3 (2) on page 274 of the text.

Skewness, is used when the distributions are not symmetric and has a "tail." A "tail" is when more data is towards larger values it is considered positive and negative if the observations are toward the smaller values.

3.4 The Word on Statistical Significance and Its Meaning

Statistical significance refers to the unlikeliness that relationships observed in the sample could be attributed to chance alone. In the social sciences, statistical significance by convention is usually set at probability (p) < .05. This means that you would expect to get the same results 95 times out of 100. If results indicate the independent variable was the cause of difference, but it was actually by chance, this is a Type I error. If the data shows no significant difference and there actually is a difference, this is a Type II error.

The accepted "margin of error" for opinion and behavior surveys is .05%.

Effect size, is independent of sample size. Effect size quantifies the difference between groups. If one group is experimental and one group is control and a statistically significant difference is found, then the effect size is is a measure of the effectiveness of the treatment.

3.5 Inferential Statistics: The Plot Thickens

Inferential statistics help determine if relationships, associations, or differences exist within data and whether statistical significance exists. Inferential statistics allow researchers to generalize the results to a larger population.

Parametric statistics are based on the assumption or normal distribution and randomized sampling, which is reported as interval or ratio data. Parametric statistical tests include t-tests, Pearson product-moment correlations and analysis of variance (ANOVA).

Nonparametric statistics and distribution-free tests because they are not based on the assumption of the normal probability curve. Nonparametric test include chi-square analysis, Mann-Whitney U, Wilcox match-pairs signed ranks, Friedman and Spearman rank-order correlation coefficient. Nonparametric tests are less powerful tests than parametric tests.

Associations Among Variables

	Chi-square (crosstabs)
	Spearman and Pearson Correlations

Tests of Differences Between and Among Groups

Parametric

	t-tests (independent t-test or dependent t-test)
	Analysis of Variance (ANOVA)

Nonparametric

	Mann-Whitney U
	Sign
	Wilcoxon Signed Ranks
	Kruskal-Wallis
	Friendman Analysis of Variance

3.6 Hurray for Computers and Data Analyses

Options for computerized data analyses are now common and easily accessible to most researchers.

Popular Statistical Packages include:

	Microsoft Excel
	SAS
	SPSS

Web Reference: Statistical Software List

3.7 Qualitative Data Analysis and Interpretation: Explaining the What, How, and Why

In qualitative data analysis and interpretation the evaluator tries to uncover perspectives and will aid in understanding the criteria being investigated.

In-depth interviews, focus groups, and qualitative observations are the most common methods to obtain qualitative data.

Analysis is the process of bringing order to qualitative data and organizing words into patterns, categories, and basic descriptive units.

Organizing Qualitative Analysis

Data reduction is the process of organizing data and developing categories for analysis. This process requires judgement calls and decision rules that the evaluator develops in summarizing and resummarizing the data.

	Qualitative Coding, includes open; axial; and selective. Coding ranges from descriptive (open coding) to interpretive (axial coding) to explanatory (selective coding) based on the stages of coding.
	Write-Ups, indicate the most important content of a study at a particular time.

Displaying the Data

Data display is an organized assembly of information that permits the drawing of conclusions and the presentation of the respondents' words.

	matrix
	visual map

Techniques for Data Analysis

Qualitative analysis is applied to the data once it is organized, open coded and/or diagrammed.

Two Major Techniques

	Enumeration strategies are used to supplement descriptive data.
	Grounded Comparison us used for recording, coding, and analyzing qualitative data.

Making Interpretations

When all data are collected and analyzed the evaluator has to draw conclusions and make recommendations based on their interpretation of the results. All data must be double checked to support or exemplify the findings. The data will be used to illustrate how the conclusions were drawn. Conclusions will use examples of "emic" ideas expressed by the respondents and "etic" ideas expresses in the evaluator's language. he evaluator will use quotes and anecdotes to tell the story from the data.

[Class] [Unit 3]