3.0 Introduction to Data Analysis
Project design includes determining the most appropriate type of data analysis. Data analysis should begin as soon as the data is collected. Selecting the appropriate statistics for your study is based on understanding a few basic principles. The correct statistics allow the evaluator to describe the data and make comparisons between the groups in the study.
3.1 Data According to Measurement
Two kinds of numbers exist:
Continuous: are data that can be measured along a continuum and can take on endless possibilities with intermediate values (examples: weight and time).
Discrete: data are noncontinuous and finite. There are no in-between numbers (example the number of hikers on a trail were 89, not 89.5).
Levels of Data
Scales or Levels of Measurement
The scales of measurement are listed below from the least to the most matching with the real number system.
Numbers mean different things in different situations. Numbers are assigned to objects according to rules. You need to distinguish clearly between the thing you are interested in and the number that symbolizes or stands for the thing. For example, you have had lots of experience with the numbers 2 and 4. You can state immediately that 4 is twice as much as 2. That statement is correct if you are dealing with numbers themselves, but it may or may not be true when those numbers are symbols for things.
The statement is true if the numbers refer to apples; four apples are twice as many as two apples. However, the statement is not true if the numbers refer to the order that runners finish in a race. Fourth place is not twice anything in relation to second place-not twice as slow or twice as far behind the first-place runner. The point is that the numbers 2 and 4 are used to refer to both apples and finish places in a race, but the numbers mean different things in those two situations.
The nominal scale, is not a scale at all in the usual sense. In the nominal scale, numbers are used simply as names and have no real quantitative value. It is the scale used for qualitative variables. Numerals on sports uniforms are an example; here, 45 is different from 32, but that is about all we can say. The person represented by 45 is not "more than" the person represented by 32, and certainly it would be meaningless to try to add 45 and 32.
Designating different colors, different sexes, or different political parties by numbers will produce nominal scales. With a nominal scale, you can even reassign the numbers and still maintain the original meaning, which is only that things with different numbers are different. Of course, all things that are alike have the same number.
The ordinal scale, has the characteristic of the nominal scale (different numbers mean different things) plus the characteristic of indicating "greater than" or "less than." In the ordinal scale, the object with the number 3 has less or more of something than the object with the number 5. Finish places in a race are an example of an ordinal scale. The runners finish in rank order, with "1" assigned to the winner, "2" to the runner-up, and so on. Here, 1 means less time than 2. Other examples of ordinal scales are house numbers.
The interval scale has properties of both the ordinal and nominal scales, plus the additional property that intervals between the numbers are equal. "Equal interval" means that the distance between the things represented by "2" and "3" is the same as the distance between the things represented by "3" and "4."
|The centigrade thermometer is based on an interval scale. The difference in temperature between 10° and 20° is the same as the difference between 40° and 50°. The centigrade thermometer, like all interval scales, has an arbitrary zero point.|
With interval data, we have one restriction: we may not make simple ratio statements. We may not say that 100° is twice as hot as 50° or that a person with an IQ of 60 is half as intelligent as a person with an IQ of 120. In park, recreation and leisure services, the most common form of interval measurement is a Likert Scale.
The ratio scale has all the characteristics of the nominal, ordinal, and interval scales, plus one: it has a true zero point, which indicates a complete absence of the thing measured. On a ratio scale, zero means "none." Height, weight, and time are measured with ratio scales. Zero height, zero weight, and zero time mean that no amount of these variables is present. With a true zero point, you can make ratio statements like "16 kilograms is four times heavier than 4 kilograms."
In order to conduct any research study, some form of measurement must be used. The most common forms of measurement used in park recreation and leisure services research are explained below.
Variable: Any characteristic that can take have more than one form or value. Examples of a variable would be gender and height. Gender has two groups or variables, height has numerous groups or variables.
Measurement: The assignment of numbers on the basis of variation.
Score: Means a number.
Three Types of Variables
Independent variables are also described as predictor, input, manipulated, treatment, stimulus, intervention, experimental, or moderating variables. The independent variable is presumed to cause, affect, or influence the outcome measures of research study.
The research method creates a situation where subjects are exposed to a condition or problem, which is called the independent measure. The subject's response to the condition or problem is the dependent variable.
An example is a study to determine whether age, sex, and occupation (independent variables) positively or negatively affect attitudes toward leisure time (dependent variable).
Dependent variables represent the effect or influence of the independent variable. They are sometimes referred to as outcome, output, or response variables. They are "dependent" in that the outcome depends on the effects of the variables being manipulated.
Control variables are also called background or classification variables because they need to be controlled, held constant, or randomized so that their effects are neutralized or accounted for during the study.
Extraneous variables can be controlled or ruled out using the following processes:
Random assignment of subjects,
The ability to manipulate the instrument or test,
The time when the measurements of the dependent variable occur ( randomized data collection times are necessary), and
Which groups are measured.
Characteristics of Leisure and Recreation Surveys
Standard Independent variables: gender, age, income, ethnicity, household characteristics, and place of residence.
Primary Dependent variables: attitude, interests, and questions relating to park, recreation and leisure services issues.
Three Types of Analysis
Univariate analysis is the examination of the distribution of cases on only one variable (including mean, median, and mode)
Bivariate is the simultaneous evaluation of two variables, one is independent and one is dependent.
Multivariate analysis uses two or more independent variables and one dependent variable.
3.2 Getting Your Data Together: Organizing and Coding Quantitative Data
Coding involves organizing the variables used in a measurement instrument to collect data. Coding is necessary for computer analysis.
Values are numbers assigned to data from a variable.
Coding systems are arbitrary so a codebook is created by the evaluator to identify the variables and their associated code. A codebook wil include the variable name, the value labels for each variable, and the corresponding question from the questionnaire. See table 3.2 (2) page 268 in the text.
Data coding, uses either:
precoding, building the numbers corresponding to responses into the questionnaire, or
postcoding, assigning numbers to response categories at the time of data entry, is essential to the process of data analysis. The specific number associated with each response often depends on the type of scale being used in the questionnaire.
3.3 Univariate Statistical Analyses: Describing What Is
The Basics of Descriptive Analysis
Descriptive statistics describe and summarize the characteristics of your data. Any form of data may provide descriptive information.
Frequency Counts and Percentages
Frequency is the total of times a score occurs in a distribution (item).
Percentage is the number of responses in a specific item divided
by the total respondents for a particular question.
Example: 22 redheads - 51 brunettes - 15 blondes = (N=88) so, 22 redheads divided by 88 = 25%
Central Tendency, gives one score or measure that
represents an entire group.
Mean: the sum of the scores divided by the number of scores. Also known as the average.
Median: the point that divides a distribution of scores exactly in half.
Mode: the most frequently occurring score (the score with the greatest frequency).
Variations in Data Characteristics
Range, is one of the easiest calculations of dissimilarity or dispersion. Range is the difference between the maximum and minimum observed values.
Variance, is a dispersion measure of variation that is based on all observations and describes the extent to which the score differ from each other.
Standard Deviation, is the average distance of each score from the mean. The more the scores cluster around the mean, the more the evaluator can conclude that everyone is performing about the same level or answering questions similarly on a questionnaire.
The Distribution Curve (What Do My Data Look Like?)
Normal Distribution (Bell Curve) with a representative sample variables will cluster around the middle of a distribution and the frequency of variables will decrease as you move away from the central concentration. The Normal Distribution is the most important theoretical distribution in statistics. In a normal distribution 68% of the observations fall within 1 standard deviation of the mean and 95% of all observations fall within 2 standard deviations of the mean. See 3.3 (2) on page 274 of the text.
Skewness, is used when the distributions are not symmetric and has a "tail." A "tail" is when more data is towards larger values it is considered positive and negative if the observations are toward the smaller values.
3.4 The Word on Statistical Significance and Its Meaning
Statistical significance refers to the unlikeliness that relationships observed in the sample could be attributed to chance alone. In the social sciences, statistical significance by convention is usually set at probability (p) < .05. This means that you would expect to get the same results 95 times out of 100. If results indicate the independent variable was the cause of difference, but it was actually by chance, this is a Type I error. If the data shows no significant difference and there actually is a difference, this is a Type II error.
The accepted "margin of error" for opinion and behavior surveys is .05%.
Effect size, is independent of sample size. Effect size quantifies the difference between groups. If one group is experimental and one group ic control and a statistically significant difference is found, then the effect size is is a measure of the effectiveness of the treatment.
3.5 Inferential Statistics: The Plot Thickens
Inferential statistics help determine if relationships, associations, or differences exist within data and whether statistical significance exists. Inferential statistics allow researchers to generalize the results to a larger population.
Parametric statistics are based on the assumption or normal distribution and randomized sampling, which is reported as interval or ratio data. Parametric statistical tests include t-tests, Pearson product-moment correlations and analysis of variance (ANOVA).
Nonparametric statistics and distribution-free tests because they are not based on the assumption of the normal probability curve. Nonparametric test include chi-square analysis, Mann-Whitney U, Wilcox match-pairs signed ranks, Friedman and Spearman rank-order correlation coefficient. Nonparametric tests are less powerful tests than parametric tests.
Associations Among Variables
Spearman and Pearson Correlations
Tests of Differences Between and Among Groups
t-tests (independent t-test or dependent t-test)
Analysis of Variance (ANOVA)
Wilcoxon Signed Ranks
Friendman Analysis of Variance
3.6 Hurray for Computers and Data Analyses
Options for computerized data analyses are now common and easily accessible to most researchers.
Popular Statistical Packages include:
Web Reference: Statistical Software List
3.7 Qualitative Data Analysis and Interpretation: Explaining the What, How, and Why
In qualitative data analysis and interpretation the evaluator tries to uncover perspectives and will aid in understanding the criteria being investigated.
In-depth interviews, focus groups, and qualitative observations are the most common methods to obtain qualitative data.
Analysis is the process of bringing order to qualitative data and organizing words into patterns, categories, and basic descriptive units.
Organizing Qualitative Analysis
Data reduction is the process of organizing data and developing categories for analysis. This process requires judgement calls and decision rules that the evaluator develops in summarizing and resummarizing the data.
Qualitative Coding, includes open; axial; and selective. Coding ranges from descriptive (open coding) to interpretive (axial coding) to explanatory (selective coding) based on the stages of coding.
Write-Ups, indicate the most important content of a study at a particular time.
Displaying the Data
Data display is an organized assembly of information that permits the drawing of conclusions and the presentation of the respondents' words.
Techniques for Data Analysis
Qualitative analysis is applied to the data once it is organized, open coded and/or diagrammed.
Two Major Techniques
Enumeration strategies are used to supplement descriptive data.
Grounded Comparison us used for recording, coding, and analyzing qualitative data.
When all data are collected and analyzed the evaluator has to draw conclusions and make recommendations based on their interpretation of the results. All data must be double checked to support or exemplify the findings. The data will be used to illustrate how the conclusions were drawn. Conclusions will use examples of "emic" ideas expressed by the respondents and "etic" ideas expresses in the evaluator's language. he evaluator will use quotes and anecdotes to tell the story from the data.
[Class] [Unit 3]
Copyright 2012. Northern Arizona University, ALL RIGHTS RESERVED