Former SLCC Students to Review

Social Statistics













Home | Expectations | Sociology of Religion | Soc 1900 -Readings | Study Guides | Resources





Introduction to Statistical Concepts 
















PERCENTAGE

This term refers to a method of standardizing for size which indicates the frequency of occurrence of a category per 100 cases. To calculate this, we simply multiply any given proportion by 100. %=100 f/n.

PROPORTION

This term compares the number of cases in a given category with the total size of the distribution. We can convert any frequency into one of these by dividing the number of cases in any given category, F, by the total number of cases in the distribution, N, or P=F/N. Therefore 10 males out of 40 students majoring in engineering can be expressed as the P=10/40=.25 for example.

CUMULATIVE FREQUENCY

This term is obtained for any category (or class interval) by adding the number of times a particular score occurs in that category to the total number of scores for all categories below it

CUMULATIVE PERCENTAGE

This term reflects the percent of cases having any score or a score that is lower. To calculate this term, we modify the formula for percentage(%) introduced earlier in this chapter as follows: c% = (100) cf/N where "cf" is the cumulative frequency in any category, and "N" is the total number of cases in the distribution.

BAR GRAPH

This item is also known as a histogram, it can accommodate any number of categories at any level of measurement and, therefore, is more widely used in social research.

 FREQUENCY POLYGON

This method can accommodate a wide variety of categories. It tends to stress continuity along a scale rather than differentness and, therefore, is particularly useful for depicting ordinal and interval data. This is because frequencies are indicated by a series of points placed over the score values or midpoints of each class interval. Adjacent points are connected with a straight line, which is dropped to the base line at either end. The height of each point or dot indicates the frequency of occurrence.

 NEGATIVELY SKEWED

This is a type of distribution in which more respondents receive high than low scores, resulting in a longer tail on the left than on the right. For example if this were the distribution of grades on a final examination, we could say that most students did quite well, and a few did poorly.

POSITIVELY SKEWED

This term refers to a distribution in which more respondents receive low than high scores, resulting in a longer tail on the right than on the left. For example, The final examination grades for the students in this hypothetical classroom would be quite low, except for a few who did well. Reference:

CENTRAL TENDENCY

This term is a single measure that represents what is "average" or "typical" of a set of data; a value generally located toward the middle of center of a distribution. The mode, the median and the mean are this type of measure.

 MODE

Most frequent or most common value in a distribution This measure reflects the most frequent, most typical, or most common value in a distribution. For example, there are more Protestants in the United States than people of any other religion; and so we refer to this religion as this term. To obtain this score from numerical data in either raw form of groups, simply find the score or category that occurs most often in a distribution. For instance, in the set of scores 1,2,3,1,3,2,1,1,2,1, this item would measure 1 as its representation. (It is often easiest to see this measure when data is arranged in an "array", that is, from lowest to highest, or highest to lowest, or at least grouped. The example above seems more obvious when we look at the data in this form: 1,1,1,1,1,2,2,2,3,3.

MEDIAN

This item refers to the middlemost point in a distribution. Therefore, this term is regarded as the measure of central tendency that cuts the distribution into two equal parts. If we have an odd number of cases, then this term will be the case that falls exactly in the middle of the distribution. The formula for acquiring this term is as follows: (N+1)/2, or by inspection. For example, 16 would represent this term for the following scores 11,12,13,16,17,20,25; this is the case that divides the distribution of numbers so that there are three scores on either side of it. According to the formula (7+1)/2, we see that the term is 16, the fourth score in the distribution counting from either end.

MEAN

This is by far the most commonly used measure of central tendency. It is obtained by adding up a set of scores and dividing by the number of scores. Therefore, we define this term formally as the sum of a set of scores divided by the total number of scores in the set This term can be regarded as the "center of gravity" of a distribution. It is similar to the notion of a seesaw, or fulcrum and lever Unlike the mode, this item is not always the score that occurs most often.

DEVIATION

This term reflects the distance and direction of any raw score from the mean. To find this item for a particular raw score, we simply subtract the mean from that score

 FREQUENCY DISTRIBUTION

This term represents a table containing the categories and score values of class intervals and their occurrence. This is a description of the number of times the various attributes of a variable are observed in a sample. The report that 53 percent of a sample were men and 47 percent were women would be a simple example of this term.

 PROBABILITY

This term refers to the relative likelihood of occurrence of any given outcome or event; that is, the likelihood associated with an event is the number of times that event can occur relative to the total number of times any event can occur. . For example, if a room contains three men and seven women, the likelihood that the next person coming out of the room is a man would be 3 in 10. In the same way, the likelihood of drawing a single card (lets say the ace of spades) from a shuffled pack of 52 cards is 1 in 52, since the outcome "ace of spades" can occur only once out of the total number of times any outcome can occur, 52 cards.

 MUTUALLY EXCLUSIVE

 No two outcomes can occur simultaneously

NORMAL CURVE

This is a theoretical or ideal model which was obtained from a mathematical equation, rather than from actually conducting research and gathering data. For example, this item can be used for describing distributions of scores, interpreting the standard deviation, and making statements of probability.

RANDOM SAMPLE

Population members are given equal chance of select.

CONFIDENCE INTERVAL

This term refers to the range within which we are sure of a statistic. This a range between the upper and lower values for a given level of confidence. For example, we might say we are 99.9 percent confident that our statistic falls within +/- 7.5 percentage points of the parameter. The range between 7.5 of our statistic and +7.5 of our statistic is called by this name.

STATISTICALLY SIGNIFICANT DIFFERENCE

A sample difference that reflects a real population difference and not just a sampling error.

PARAMETRIC TEST

A statistical procedure which requires that the characteristic studied be normally distributed in the population and that the researcher have interval data.

NONPARAMETRIC TEST

A statistical procedure which makes no assumptions about the way the characteristic being studied is distributed in the population and requires only ordinal or nominal data.

UNIT OF OBSERVATION

This term refers to the type of elements being studied or observed. Individuals are most often an example of this; but sometimes collections or aggregates such as families, census tracts, or states are used.

PARTICIPANT OBSERVATION

This term reflects the researcher's involvement in the daily life of the people under study, either openly in the role of researcher or covertly in some disguised role, viewing things that happen, listening to what is said, and questioning people, over some length of time.

NOMINAL LEVEL OF MEASUREMENT

This term refers to the process of placing cases into categories and ccounting their frequency of occurrence. For example, we might use this technique to indicate whether each respondent is prejudiced in attitude toward Hispanics. (SEE TABLE 1.1) We might question the 10 students in a given class and determine that 5 can be regarded as "(1) prejudiced", and 5 can be considered "(2) unprejudiced".

ORDINAL LEVEL OF MEASUREMENT

This term refers to the process of ranking cases in terms of the degree to which they have any given characteristic, but this does not indicate the magnitude of differences between numbers. For instance, the social researcher who employs this technique to study prejudice toward Hispanics does not know how much more prejudiced one respondent is than another.

INTERVAL LEVEL OF MEASUREMENT

This term refers to the process of assigning a score to cases so that the magnitude of differences between them is known and meaningful. This technique uses constant units of measurement (for example, dollars or cents, Fahrenheit, yards, or feet), which yield equal intervals between points on the scale.

DESCRIPTIVE STATISTICS

This term refers to computations relating either the characteristics of a sample of the relationship among variables in a sample. They merely summarize a set of sample observations.

VARIABLE

Any characteristic which varies from one individual to another. Hypotheses usually contain an independent (cause) and a dependent (effect).

CORRELATION STRENGTH

The Degree of association between two variables

Correlation coefficients generally range between 1.00 and +1.00 as follows:

1.00 indicates perfect negative correlation

0.60 indicates a strong negative correlation

0.20 indicates a weak negative correlation

0.00 indicates no correlation

+0.20 indicate a moderate positive correlation

+0.60 indicates a strong positive correlation

+1.00 indicates a perfect positive correlation

NEGATIVE CORRELATION

(X and Y variables move in opposite directions)

The direction of the relationship wherein individuals who score high on the X variable score low on the Y variable; individuals who score low on the X variable score high on the Y variable.

SCATTER PLOT

A graph that shows the way scores on any two variables X and Y are distributed throughout the range of possible score values. 

 SPURIOUS RELATIONSHIP

A noncausal relationship between two variables that exists only because of the common influence of a third variable. This relationship disappears if the third variable is held constant.

 



















Enter content here


Enter content here


Enter content here

search tips advanced search
site search by freefind