Table of Contents
Data: ageheight.txt
Keywords: Simple linear regression.
Description: Obviously the height of a child is not constant, but
increases over time. On the other hand it is well-known that the growth
pattern varies between children. In this dataset the focus is on determining
the general growth pattern. One way to explore this is by using the average
of several childrens heights, as presented in this dataset.
The response variable is the average heights of a group of 161 children in
Kalama, an Egyptian village: the site of a study of nutrition in developing
countries. The data were obtained by measuring the heights of all 161
children in the village each month over several years. Time is the
explanatory variable.
Number of observations: 12
| Variable |
Description |
| age |
Age in months |
| height |
Average height in centimetres for children at this age |
| |
|
Source: DASL.
Data: babycrawl.txt
Keywords: Linear regression, correlation.
Description: This study investigated whether babies take longer to
learn to crawl in cold months when they are often bundled in clothes that
restrict their movements, than in warmer months. The study sought an
association between babies' first crawling age and the average temperature
at the month they first try to crawl (about 6 months after birth).
Parents brought their babies into the University of Denver Infant Study
Center between 1988-1991 for the study. The parents reported the birth month
and age at which their child was first able to creep or crawl a distance of
four feet in one minute.
Data were collected on 208 boys and 206 girls (40 pairs of which were
twins). Correlation and regression can be used to examine the relationship
between the average crawling age and the average temperature.
There are a few problems about this data set, which might affect analyses:
- The babies are not all independent because there are twins in the
study.
- The normality assumption is dubious since outliers can only occur at
higher ages of first crawling.
- The study was conducted on self-selected volunteers, who may be
different from the general population.
Number of observations: 12
| Variable |
Description |
| month |
Month of birth |
| crawlingage |
Average age in weeks that this group learned to crawl |
| sd |
Standard deviation of time to crawling for this group |
| n |
Number of infants in that birth month group |
| fahrenheit |
Average monthly temperature in fahrenheit six months after
birth month |
| celsius |
Average monthly temperature in celsius six months after birth month |
| |
|
Source: DASL.
Data: beetles.txt
Keywords: Simple linear regression, through origin.
Description: In a botanical experiment a researcher wanted to
estimate the number of individuals of a particular species of beetle
(Diaperus maculatus) within fruiting bodies (`brackets') of the birch
bracket fungus Polyporus betulinus. (This is a shelf fungus that grows on
the trunks of dead birch trees.) When the brackets are stored in the
laboratory, the beetle larvae within them mature over several weeks-the
adults then emerge and can be removed and counted.
Number of observations: 25
| Variable |
Description |
| weight |
Weights of the brackets (in grams) |
| beetles |
Number of beetles in bracket |
| |
|
Source: Pielou, E.C. (1974) Population and Community
Ecology-Principles and Methods, Gordon and Breach, New York, pp. 117-121.
Data: brainweight.txt
Keywords: Regression, log-log transformation.
Description: The average brain and body weights for 62 species of
mammals. In ST111, it is considered as a problem of modeling brain weight as
a function of body weight. These data were taken from a larger study and
were collected for another purpose.
Number of observations: 62
| Variable |
Description |
| body |
Body weight (in kilos) |
| brain |
Brain weight (in grams) |
| |
|
Source: Allison, T. and Cicchetti, D.H. (1976) Sleep in mammals:
Ecological and constitutional correlates, Science, 194,
pp. 732-734.
Data: cemstren.txt
Keywords: Regression, non-constant variance, non-linear.
Description: One of the things that influences the tensile strength
of cement is the length of time for which the cement is `cured' (that is,
dried). An experiment was set up to test different batches of cement for
tensile strength, after different curing times.
Number of observations: 21
| Variable |
Description |
| days |
Number of days of curing |
| strength |
Tensile strength of cement |
| |
|
Source: Hald, A. (1952) Statistical Theory with Engineering
Applications, New York, John Wiley.
Data: constantcar.txt
Keywords: Linear regression, through origin.
Description: This is a hypothetical dataset. Imagine driving a car
at constant speed 50 km/h and observe with 5 minute intervals the distance
you have gone.
Number of observations: 12
| Variable |
Description |
| time |
Time in minutes |
| distance |
Distance in km |
| |
|
Source: The observations are generated using S-Plus.
Data: crystals.txt
Keywords: Simple linear regression.
Description: Measurements were made on the axial lengths of ice
crystals at various times between 50 seconds and 180 seconds after
introduction into a chamber maintained at a constant temperature of -5 ^oC.
Number of observations: 43
| Variable |
Description |
| length |
Axial length of crystal (in micrometres) |
| time |
Time (in seconds) after introduction of the ice crystal into the
chamber |
| |
|
Source: Ryan, B.F., Wishart, E.R. and Shaw, D.E. (1976) The growth
rates and densities of ice crystals between -3 ^oC and -21 ^oC, J.
Atmospheric Sciences, 33, pp. 842-850.
Data: gasification.txt
Keywords: Linear regression.
Description: The data represent the fuel gas temperature (in
degrees Fahrenheit) and unit heat rate (in Btus per kilowatt hour) for a
combustion turbine to be used in coal gasification.
Number of observations: 9
| Variable |
Description |
| temp |
Fuel gas temperature |
| heat |
Unit heat rate |
| |
|
Source: Milton, J.S. & Arnold, J.C. (1995) Introduction to
probability and statistics, McGraw Hill International Editions, p.445.
Data: geese.txt
Keywords: Regression, non-constant variance.
Description: Two airborne observers (labelled A and B,
respectively) were used to estimate the sizes of flocks of snow geese in an
area west of Hudson Bay in Canada. In one study, photographs were also taken
of the flocks, and careful counts were made from the film.
Number of observations: 45
| Variable |
Description |
| photo |
Photographic counts |
| Aestimate |
Number estimated by observer A |
| Bestimate |
Number estimated by observer B |
| |
|
Source: Lunneborg, C.E. (1994) Modeling Experimental and
Observational Data, Duxbury Press, p. 115.
Data: hdl.txt
Keywords: Regression.
Description: An experiment involved a quantitative analysis of
factors found in high-density lipoprotein (HDL) in a sample of human blood
serum. Three variables thought to be predictive of or associated with HDL
measurements are the total cholesterol and total triglyceride concentration
in the sample, and the presence or absence of a certain sticky component
called sinking pre-beta (or SPB). The data in this data set correspond to
samples for which the SPB was absent.
Number of observations: 21
| Variable |
Description |
| hdl |
Concentration of HDL |
| cholest |
Total concentration of cholesterol |
| triglyc |
Total concentration of triglyceride |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E., Nizam, A.
(1998) Applied Regression Analysis and Other Multivariable Methods,
3rd Edition, Duxbury Press, Brooks/Cole Publishing Company, p. 202.
Data: houses.txt
Keywords: Linear regression.
Description: It is natural to expect that the larger the house, the
higher the price. That is, we expect price and size of the house to be
positively correlated. Ten houses were randomly selected among newspaper
adds for houses. The relationship between area and price is vague suggesting
that it is not only the size of a house that determines the price.
Number of observations: 10
| Variable |
Description |
| size |
Size of the house in m
|
| price |
Price in 1000 DKK |
| |
|
Source:
Bent
Jørgensen.
Data: mobility.txt
Keywords: Linear regression, correlation.
Description: This dataset concerns the comparison of two measures
of the mobility of elderly people. The two methods to be compared are the
Berg score and Timed Up an Go (TUG). The Berg score is a measure based on
how well the person performs in a number of different tasks. A low score
corresponds to low mobility. The TUG score is simply the time it takes a
person to get up from a chair, walk three metres and return to the chair.
Measuring the Berg score is much more demanding and time-consuming than
measuring the TUG score. It is of interest to determine the relationship
between the results obtained using two methods. If there is a strong
relation, the fast method can be used as a good predictor of the slow
method.
Number of observations: 16
| Variable |
Description |
| tug |
TUG score |
| berg |
Berg score |
| |
|
Source: Dorte Skovhede.
Data: olympic.txt
Keywords: Linear regression, time series, prediction.
Description: This dataset contains the gold medal performances in
the men's long jump, high jump and discus for the modern Olympic games from
1900 to 1984. Regressions and scatterplots of performance variables versus
year show performance improvement.
The World Wars create some gaps in the data and can be seen in the graphical
displays.
Number of observations: 20
| Variable |
Description |
| high |
Height of high jump (cm) |
| discus |
Distance of throw (cm) |
| long |
Distance of jump (cm) |
| year |
Year of the Olympic |
| |
|
Source: DASL
- but converted from inches to cm.
Data from 1988 and
1992.
Data: paper.txt
Keywords: Regression, transformation.
Description: The tensile strength (p.s.i.) of Kraft paper
was measured against the percentage of hardwood in the batch of pulp from
which the paper was produced.
Number of observations: 19
| Variable |
Description |
| strength |
Tensile strength |
| wood |
Percentage of hardwood in pulp |
| |
|
Source: Joglekar, G., Schuenemeyer, J.H. and LaRiccia, V. (1989)
Lack-of-fit testing when replicates are not available, American
Statistician, 43, pp. 135-143.
Data: roadmap.txt
Keywords: Linear regression, through the origin.
Description: This dataset contains the distances (in miles) by
road, and the corresponding straight line distances (measured from a map)
between twenty different pairs of points in Sheffield.
Number of observations: 20
| Variable |
Description |
| road |
Road distances (in miles) |
| map |
Map distances (in miles) |
| |
|
Source: Gilchrist, W. (1984) Statistical modelling, John
Wiley and Sons, Chichester, p.5.
Data: spinach.txt
Keywords: Linear regression.
Description: This dataset stems from a study concerning the
preservation of ascorbic acid in vegetables during drying and storing. The
amount of acid preserved is the response variable, while the percentage dry
matter is the explanatory variable.
Number of observations: 24
| Variable |
Description |
| dry |
Percentage dry matter after drying at 90
C |
| acid |
Percentage preserved ascorbic acid |
| |
|
Source: Hald, A. (1952) Statistical Theory with Engineering
Applications, New York: Wiley.
Data: strong.txt
Keywords: Regression, non-linear, non-constant variance.
Description: This is the psychologist Strong's famous data set on
memory retention. Average percentage memory retention was measured against
passing time. The measurements were taken five times during the first hour
after subjects memorized a list of disconnected items, and then at various
times up to a week later.
Number of observations: 13
| Variable |
Description |
| memory |
Percentage of memory retention |
| time |
Times (in minutes) |
| |
|
Source: Mosteller, F., Rourke, R.E.K. and Thomas, G.B. (1970) Probability with statistical applications, 2nd edn. Addison-Wesley, p. 383.
Data: tvads.txt
Keywords: Non-linear regression.
Description: These data concern the relation between advertising
spending and advertising yield.
Number of observations: 21
| Variable |
Description |
| company |
Company name |
| budget |
TV advertising budget, 1983 ($ millions) |
| impression |
Millions of retained impressions per week |
| |
|
Source: DASL.
Data: velocity.txt
Keywords: Non-linear regression.
Description: These data represent the velocity of an enzymatic
reaction as a function of substrate concentration.
Number of observations: 12
| Variable |
Description |
| Velocity |
The counts per minute of radioactive product from the reaction |
| Concentration |
The substrate concentration (in parts per million) |
| |
|
Source: Severini, T.A. (2000) Likelihood methods in
statistics, New York: Oxford University Press, p. 356.
Data: windspeed.txt
Keywords: Regression, transformation.
Description: These data are on the production of power from wind
mills. Direct current output was measured against wind speed (in miles per
hour).
Number of observations: 25
| Variable |
Description |
| output |
Current output produced by the wind mill |
| speed |
Windspeed (in miles per hour) |
| |
|
Source: Joglekar, G., Schuenemeyer, J.H. and LaRiccia, V. (1989)
Lack-of-fit testing when replicates are not available, American
Statistician, 43, pp. 135-143.
Data: anscombe.txt
Keywords: Linear Regression, correlation.
Description: It is well known that correlation coefficients can be
misleading. This dataset was constructed to shed light on whether the same
could be the case for linear regression. Make the following four
scatterplots:
vs
,
vs
,
vs
, and
vs
. (
and
are considered as explanatory variables).Make the four simple
linear regressions and compare results.
Number of observations: 11
| Variable |
Description |
| x |
An explanatory variable |
| y1 |
A response variable |
| y2 |
A response variable |
| y3 |
A response variable |
| x4 |
An explanatory variable |
| y4 |
A response variable |
| |
|
Source:
Jerry Dallal's Tufts Home
Page.
Data: cement.txt
Keywords: Multiple linear regression.
Description: The heat evolved during cement hardening is influenced
by the composition of the cement. The heat evolved was measured for a number
of samples. Further, the contents of tricalcium-aluminate (TA),
tricalcium-silicate (TS) and tetracalcium-alumino-ferrite (TAF) were
measured for each sample of cement.
Number of observations: 13
| Variable |
Description |
| heat |
Evolved heat (in calories/g) |
| TA |
Amount (as percentage of weight) of tricalcium-aluminate (TA) |
| TS |
Amount (as percentage of weight) of tricalcium-silicate (TS) |
| TAF |
Amount (as percentage of weight) of tricalcium-alumino-ferrite (TAF) |
| |
|
Source: Woods, H., Steiner, H.H. and Starke, H.R. (1932) Effects of
composition of Portland cement on heat evolved during hardening, Industrial and Engineering Chemistry, 24, pp. 1207-1212.
Data: children.txt
Keywords: Multiple linear regression.
Description: The weight, height, and age were measured for each
member of a sample of 12 children with a particular kind of nutritional
deficiency.
Number of observations: 12
| Variable |
Description |
| weight |
The child's weight |
| height |
The child's height |
| age |
The child's age |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E., Nizam, A.
(1998) Applied Regression Analysis and Other Multivariable Methods,
3rd Edition, Duxbury Press, Brooks/Cole Publishing Company, p. 112.
Data: copper.txt
Keywords: Multiple linear regression, residual analysis.
Description: This dataset relate to the processing of copper ore in
a given calender month. The response variable (
) is percentage of copper
recovered for a certain production process. The explanatory variables are
shown in the table below.
Number of observations: 24
| Variable |
Description |
| Date |
Date of production |
| Solids |
Percentage of solids in the ore |
| Mesh |
A measure of mesh size |
| y |
Percentage of copper recovered for a certain production process |
| Retention |
Retention time |
| |
|
Source: Jørgensen, B. (1993) The Theory of Linear Models,
Chapman & Hall. (Originally supplied by R.J. MacKay.)
Data: detox3mc.txt
Keywords: Multiple linear regression.
Description: It is known that one can alter the toxicity of various
types of chemicals (e.g. drugs, pesticides or insecticides) in
mammals by inducing liver enzyme activity. This example relates to a study
investigating the relationship between detoxification of malathion (an
insecticide containing phosphorus) and induced enzyme activity in chickens.
Five different enzyme activities were induced using the enzyme inducer
3-methylcholanthrene (3-MC).
Number of observations: 10
| Variable |
Description |
| detox |
Detoxification of malathion (in percent, relative to a control,
untreated, chicken). |
| enzyme1 |
Enzyme 1 activity in a treated chicken, relative to a control
chicken (in %) |
| enzyme2 |
Enzyme 2 activity in a treated chicken, relative to a control
chicken (in %) |
| enzyme3 |
Enzyme 3 activity in a treated chicken, relative to a control
chicken (in %) |
| enzyme4 |
Enzyme 4 activity in a treated chicken, relative to a control
chicken (in %) |
| enzyme5 |
Enzyme 5 activity in a treated chicken, relative to a control
chicken (in %) |
| |
|
Source: Ehrich, M., Larson, C. and Arnold, J. (1983)
Organophosphate Detoxification Related by Induced Hepatic Microsomal Enzymes
in Chickens, American Journal of Veterinary Research, 45.
Data: detoxbht.txt
Keywords: Multiple linear regression.
Description: It is known that one can alter the toxicity of various
types of chemicals (e.g. drugs, pesticides or insecticides) in
mammals by inducing liver enzyme activity. This example relates to a study
investigating the relationship between detoxification of malathion (an
insecticide containing phosphorus) and induced enzyme activity in chickens.
Five different enzyme activities were induced using the enzyme inducer
butylated hydroxytoluene (BHT).
Number of observations: 10
| Variable |
Description |
| detox |
Detoxification of malathion (in percent, relative to a control,
untreated, chicken). |
| enzyme1 |
Enzyme 1 activity in a treated chicken, relative to a control
chicken (in %) |
| enzyme2 |
Enzyme 2 activity in a treated chicken, relative to a control
chicken (in %) |
| enzyme3 |
Enzyme 3 activity in a treated chicken, relative to a control
chicken (in %) |
| enzyme4 |
Enzyme 4 activity in a treated chicken, relative to a control
chicken (in %) |
| enzyme5 |
Enzyme 5 activity in a treated chicken, relative to a control
chicken (in %) |
| |
|
Source: Ehrich, M., Larson, C. and Arnold, J. (1983)
Organophosphate Detoxification Related by Induced Hepatic Microsomal Enzymes
in Chickens, American Journal of Veterinary Research, 45.
Data: devices.txt
Keywords: Multiple regression.
Description: Medical devices from three different suppliers for the
continuous delivery of an anti-inflammatory hormone were tested on 27
patients.
Number of observations: 27
| Variable |
Description |
| supplier |
The supplier of the device (labelled 1,2 or 3) |
| time |
Time the device was used (in hours) |
| remains |
Amount remaining in the device after use |
| |
|
Source: Efron, B. and Tibshirani, R.J. (1993) An Introduction
to the Bootstrap, Chapman and Hall, New York, p. 107.
Data: fuel.txt
Keywords: Multiple linear regression.
Description: For 48 contiguous states, a number of variables were
recorded in 1971-2: Population size, motor fuel tax rate, number of licensed
drivers, per capita income, extend of federal-aid primary highways, fuel
consumption.
Number of observations: 48
| Variable |
Description |
| pop |
1971 Population, in thousands |
| tax |
1972 Motor fuel tax rate, in cents per gallon |
| nlic |
1971 Thousands of licensed drivers |
| inc |
1972 Per capita income in thousands of dollars |
| road |
1971 Thousand of miles of federal-aid primary highways |
| fuelc |
1972 Fuel consumption, in millions of gallons |
| dlic |
1971 Percentage of population with driver's license |
| fuel |
Motor fuel consumption in gallons per person |
| |
|
Source: Weisberg, S. (1985) Applied Linear Regression,
Wiley, p. 34.
Data: gifted.txt
Keywords: Multiple regression.
Description: An investigator is interested in understanding the
relationship, if any, between the analytical skills of young gifted children
and the following variables: father's IQ, mother's IQ, age in month when the
child first said `mummy' or `daddy', age in month when the child first
counted to 10 successfully, average number of hours per week the child's
mother or father reads to the child, average number of hours per week the
child watched an educational program on TV during the past three months,
average number of hours per week the child watched cartoons on TV during the
past three months. The analytical skills are evaluated using a standard
testing procedure, and the score on this test is used as the response
variable.
Data were collected from schools in a large city on a set of thirty-six
children who were identified as gifted children soon after they reached the
age of four.
Number of observations: 36
| Variable |
Description |
| score |
Score in test of analytical skills |
| fatheriq |
Father's IQ |
| motheriq |
Mother's IQ |
| speak |
Age in months when the child first said `mummy' or `daddy' |
| count |
Age in months when the child first counted to 10 successfully |
| read |
Average number of hours per week the child's mother or father reads
to the child |
| edutv |
Average number of hours per week the child watched an educational
program on TV during the past three months |
| cartoons |
Average number of hours per week the child watched cartoons on TV
during the past three months |
| |
|
Source: Graybill, F.A. & Iyer, H.K., (1994) Regression
Analysis: Concepts and Applications, Duxbury, p. 511-6.
Data: grocery.txt
Keywords: Multiple regression.
Description: The manager of the marketing division of a grocery
store chain wants to conduct a study in a particular US city, where the
company wants to open a store, to understand the relationship between the
number of dollars a household spends in grocery stores each month and the
following variables: monthly income for the household, number of children in
the household, and the number of adults in the household. A group of 27
grocery shoppers were selected by simple random sampling from a study
population and are requested to provide the needed information.
Number of observations: 27
| Variable |
Description |
| amount |
Monthly amount spend by household in grocery store (in US$) |
| income |
Monthly income for the household (in US$) |
| children |
Number of children in the household |
| adults |
Number of adults in the household |
| |
|
Source: Graybill, F.A. and Iyer, H.K. (1994) Regression
Analysis: Concepts and Applications, Duxbury, p. 286.
Data: hospitals.txt
Keywords: Multiple linear regression.
Description: A study was made on monthly man-hours associated with
maintaining the anesthesiology service for twelve naval hospitals in the
United States.
Number of observations: 12
| Variable |
Description |
| manhours |
Monthly number of man-hours |
| cases |
Monthly number of surgical cases |
| population |
Eligible population (in thousands) |
| rooms |
Number of operating rooms in the hospital |
| |
|
Source: Brooks, D.G., Carroll, S.S. and Verdini, W.A. (1988)
Characterizing the domain of a regression model, American Statistician, 42, pp. 187-190.
Data: icecream.txt
Keywords: Linear regression, time series, ANCOVA, autocorrelation.
Description: The purpose of the study was to determine if ice cream
consumption depends on the variables price, income, or temperature. Further
the variables Lag-temp (the temperature the next month) and Year have been
added to the original data.
Ice cream consumption was measured over 30 four-week periods from March 18,
1951 to July 11, 1953.
Number of observations: 30
| Variable |
Description |
| period |
Identifier for the four week period (1-30). |
| IC |
Ice cream consumption in pints per capita |
| price |
Price of ice cream per pint in dollars |
| income |
Weekly family income in dollars |
| temp |
Average temperature in fahrenheit |
| year |
Year within the study (0=1951, 1=1952, 2=1953) |
| |
|
Source: Koteswara Rao Kadiyala (1970) Testing for the independence
of regression disturbances, Econometrica, 38, pp. 97-117.
Also found in: Hand, D.J., et al. (1994) A Handbook of Small Data Sets, London: Chapman & Hall, p. 214. DASL.
Data: odsherred.txt
Keywords: Multiple linear regression.
Description: These data contains the sales prices of 5 holiday
cottages in Odsherred, Denmark, together with the age and the livable area
of each house.
Number of observations: 5
| Variable |
Description |
| price |
Price, in DDK 1000 (Danish kroner) |
| age |
Age of the house, in years |
| area |
Livable area, in square metres |
| |
|
Source: Data from Nybolig, May 2003.
Data: paintcrack.txt
Keywords: Regression, transformation.
Description: A research study was conducted on cracking of latex
paint on wooden structures. The primary concern in the study is to
investigate the effect of water permeability and fracture energy (energy to
propagate a crack through paint film) on paint crack rating.
Number of observations: 10
| Variable |
Description |
| rating |
Crack rating (between 0-10) |
| permeability |
Water permeability |
| energy |
Fracture energy |
| |
|
Source: Milton, J.S. and Arnold, J.C. (1995) Introduction to
Probability and Statistics, 3rd ed., McGraw Hill, p. 524.
Data: pathology.txt
Keywords: Multiple regression.
Description: These data refer to a study of whether the level of
pathology in psychotic patients 6 months after treatment can be predicted
with reasonable accuracy from knowledge of pre-treatment symptom ratings of
thinking disturbance and hostile suspiciousness.
Number of observations: 53
| Variable |
Description |
| pathology |
Level of pathology |
| thinking |
Pre-treatment symptom rating of thinking disturbance |
| suspicious |
Pre-treatment symptom rating of hostile suspiciousness |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E., Nizam, A.
(1998) Applied Regression Analysis and Other Multivariable Methods,
3rd Edition, Duxbury Press, Brooks/Cole Publishing Company, p. 125.
Data: silkworm.txt
Keywords: Multiple linear regression.
Description: An experiment was conducted in order to describe the
toxic action of a certain chemical on silkworm larvae. The larvae were fed
various doses of the chemical, and the survival times (i.e. time
until death) were recorded, together with the weights of the larvae.
Number of observations: 15
| Variable |
Description |
| survival |
Survival times of the larvae |
| dose |
Doses of the chemical |
| weight |
Weights of the larvae |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E., Nizam, A.
(1998) Applied Regression Analysis and Other Multivariable Methods,
3rd Edition, Duxbury Press, Brooks/Cole Publishing Company.
Data: stream.txt
Keywords: Multiple linear regression, collinearity.
Description: Can Southern California's water supply be predicted
form past rainfall data? One factor affecting water availability is stream
runoff. If runoff could be predicted, engineers, planners and policy makers
could do their jobs more efficiently. The dataset contains 43 years worth of
precipitation measurements taken at four sites in the Owen's Valley. Much of
the water for Southern California is supplied by the Owens Valley Aqueduct.
If the Owens Valley and the nearby Sierra Mountains get little rain, then
water will be low in the aqueduct. The explanatory variables, labeled APSAB,
APSLAKE, OPRC and OPSLAKE, are rainfall measurements for four sites in or
near Owens Valley. The response, labeled BSAAM, is the stream runoff at a
site near Bishop, California. Stream runoff volume is a stand-in for volume
of water delivered to the aqueduct.
There is high correlation between the explanatory variables implying two
models gives adequate fits, depending on the test procedure.
Number of observations: 43
| Variable |
Description |
| obs |
Observation number |
| year |
Year |
| APSAB |
Rainfall measurement |
| APSLAKE |
Rainfall measurement |
| OPRC |
Rainfall measurement |
| OPSLAKE |
Rainfall measurement |
| BSAAM |
Stream runoff |
| |
|
Source:
Bent
Jørgensen.
Data: trees.txt
Keywords: Multiple regression.
Description: In order to find an estimate for the volume of a tree
(and thereby the timber yield), the volumes, heights and diameters were
collected for a sample of 31 black cherry trees in the Allegheny National
Forest, Pennsylvania.
Number of observations: 31
| Variable |
Description |
| volume |
Volume of the tree (in cubic feet) |
| height |
Height of the tree (in feet) |
| diameter |
Diameter of the tree (in inches, at 54 inches above ground) |
| |
|
Source: Atkinson, A.C. (1982) Regression diagnostics,
transformations and constructed variables (with discussion). J. Royal
Statistical Society, Series B, 44, pp. 1-36.
Data: water.txt
Keywords: Multiple linear regression.
Description: A production plant cost-control engineer is
responsible for cost reduction. One of the costly items in his plant is the
amount of water used by the production facilities each month. He decided to
investigate water usage by collecting seventeen observations on his plant's
water usage and other variables.
Number of observations: 17
| Variable |
Description |
| temperature |
Average monthly temperate (F) |
| production |
Amount of production (M pounds) |
| days |
Number of plant operating days in the month |
| workers |
Number of workers on the monthly plant payroll |
| water |
Monthly water usage (gallons) |
| |
|
Source: OzDASL,
Draper, N.R., and Smith, H. (1981) Applied Regression Analysis, 2nd
Edition, Wiley: New York.
Data: algae.txt
Keywords: ANCOVA.
Description: The data in this project concern the relationship
between biomass (measured as the bio volume and concentration of the pigment
chlorophyll a, in lakes dominated by three common types of algae. (A
lake is said to be dominated by a specific algae if at least 80% of the
total biomass consists of this algae.) The data were collected in 17
monitored lakes, in the period 1989-99. Information about which lakes the
different measurements were taken from is not available.
Number of observations: 584
| Variable |
Description |
| class |
Type of algae (kisel, bluegreen or fure) |
| biovolume |
Bio volume (in mm3/l) |
| chlorophyll |
Concentration of chlorophyll a (in mg/l) |
| |
|
Source: The data are provided by Anne Lilholt, Institute of
Biology, SDU.
Data: collembola.txt
Keywords: Linear regression, ANCOVA, transformation.
Description: In the ecology of soils, one is interested in, among
other things, measuring the biomass in the topsoil. A traditional measure is
the live weight of small animals that live in the layers of the soil.
However, it is difficult under practical field experiments to obtain the
live weight. Instead the small animals are extracted from the soil and then
dried and the dry weight is determined. Hence we are interested in a model
that predicts the live weight (
) from the dried weight (
).
Traditionally the life weight (
) is calculated as a given percentage
, say, of the dried weight (
), i.e. a functional
relationship on the form
where
is the response and
is the explanatory variable. A
log-transformation leads to
which results in a simple linear regression model
where
and
. Further we
have introduced
. If the proposed functional relationship in (1.3) holds, the slope should be
. By re-expressing the model using
the log-transform we can test whether the slope is
. Further obtain an
estimate of
and from this we can find
- the parameter of
interest.
Number of observations: 37
| Variable |
Description |
| species |
Species of collembola (3 different) |
| logLW |
The log to the live weight |
| logDW |
The log to the dried weight |
| |
|
Source: Anvendt Statistik (1983-85), Opgaver, Vol 1-4. (Eds. Andersen, A.H. and Keiding, N.) Department of Theoretical
Statistics, University of Aarhus.
Data: diet.txt
Keywords: Comparing two regression lines.
Description: In a study on the effects of different diets on rats,
two groups of rats (on different diets) were weighed weekly over a four-week
period. (The rats had been on the diets for some time before the four-week
period.)
Number of observations: 48
| Variable |
Description |
| weight |
Weights of rat (in grams) |
| week |
Week number |
| diet |
Indexing diet: diet=1 if rat is on first diet, diet=0, otherwise |
| |
|
Source: Crowder, M.J. and Hand, D.J. (1990) Analysis of
repeated measures, Chapman and Hall, London, p. 19.
Data: market.txt
Keywords: Comparing regression lines.
Description: Market research was conducted for a national retail
company to compare the relationship between sales and advertising during the
warm Spring and Summer seasons as compared with the cold Autumn and Winter
seasons. The data were collected over a period of several years.
Number of observations: 18
| Variable |
Description |
| season |
The season (Warm=0, Cool=1) |
| expenditure |
The advertising expenditure (in million $) |
| revenue |
The sales revenue (in million $) |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E., Nizam, A.
(1998) Applied Regression Analysis and Other Multivariable Methods,
3rd Edition, Duxbury Press, Brooks/Cole Publishing Company, p. 353.
Data: michelson.txt
Keywords: Comparing regression lines.
Description: In 1879, Michelson undertook a set of experiments in
order to determine the speed of light. A total of 100 sets of experiments
were undertaken over several days (mornings and afternoons) between June 5
and July 2 1879. In each set of experiments he made 10 speed determinations
and reported the average of the ten values. (This dataset only contains 98
of his measurements.)
With modern technology, the speed of light has been measured to be
299,792.46 km per second in a vacuum. So Michelson's measurements were
impressively precise all things considered.
Number of observations: 98
| Variable |
Description |
| speed |
Average of the ten speed values measured in experiment (in km per
second) |
| temp |
Ambient temperature at time of experiment (in Fahrenheit) |
| ampm |
Time of day (am=0, pm=1) |
| |
|
Source: Michelson, A.A. (1880) Experimental determination of the
velocity of light made at the US naval Academy, Annapolis, Astronomical papers, 1, 109-145.
Data: photosyn.txt
Keywords: Comparing regression lines.
Description: A study was made into photosynthesis rate and
radiation for three levels (Low, Medium and High) of water availability. It
is desired to determine how similar the relationship between the
photosynthesis rate and radiation is at the three water levels.
Number of observations: 15
| Variable |
Description |
| photosyn |
Photosynthesis rate |
| radiation |
Radiation |
| level1 |
Indicator for low water level (low=1, 0 otherwise) |
| level2 |
Indicator for medium water level (medium=1, 0 otherwise) |
| level |
Water level (low=1, medium=2, high=3) |
| |
|
Source: Krzanowski, W.J.(1998) Statistical Modelling,
Arnold, London, p.107.
Data: pressure.txt
Keywords: Linear regression, ANCOVA, same intercept.
Description: It is well-known that blood pressure increases with
age. In this dataset we examine this relation. Age and systolic blood
pressure where measured for 28 males. 15 of these are university teachers,
while the remaining 13 are journalists. Along with the interest in the
overall increase in systolic blood pressure, we can compare the regression
lines between the two groups. [Note: data are not genuine.]
Number of observations: 28
| Variable |
Description |
| occupation |
Occupation (0=journalist, 1=university lecturer) |
| age |
Age in years |
| systolic |
Systolic blood pressure in mmHg |
| |
|
Source: Blæsild, P. and Granfeldt, J., (1995), Statistik for
biologer og geologer, Det Naturvidenskabelige Fakultet, University of
Aarhus, Denmark.
Data: rust.txt
Keywords: Comparing regression lines.
Description: For a number of chondrites, the age and type (Type I
or Type II) were noted, and the rust component was measured. Initially, the
research question concerned the dependence of rustiness on age. A further
question is whether there is a difference between the two types of
chondrites, Type I and Type II.
Number of observations: 82
| Variable |
Description |
| type |
Type of chondrite (Type I=0, Type II=1) |
| age |
Age of chondrite (in years) |
| rust |
Rust component (in percent) |
| |
|
Source: Dr T.B. Smith, The Open University.
Data: twins.txt
Keywords: Comparing regression lines.
Description: These data concern IQ scores of identical twins, one
raised in a foster home and the other raised by natural parents. The data
are divided into three groups according to the social class of the natural
parents.
Number of observations: 27
| Variable |
Description |
| foster |
IQ score of the twin raised in a foster home |
| natural |
IQ score of the twin raised by natural parents |
| high |
Twins from high social class (high=1, 0 otherwise) |
| middle |
Twins from middle social class (middle=1, 0 otherwise) |
| class |
Social class (high=1, middle=2, low=3) |
| |
|
Source: Weisberg, S. (1985) Applied linear regression,
John Wiley & Sons, p. 180.
Data: uffi.txt
Keywords: Comparing regression lines.
Description: The data were collected to see if the presence of urea
formaldehyde foam insulation (UFFI) had an effect on the formaldehyde
concentration in homes.
Number of observations: 24
| Variable |
Description |
| formaldehyde |
Average concentration of formaldehyde measured over a week
(in ppb) |
| airtightness |
A measure of airtightness (calculated from several other
measurements) |
| uffi |
Index for whether or not UFFI was used. (uffi=1, if used, uffi=0,
otherwise) |
| |
|
Source: Jørgensen, B. (1993) The Theory of Linear Models,
Chapman & Hall, p. 120. (Originally supplied by R.J. MacKay.)
Data: burns.txt
Keywords: Logistic regression.
Description: These data refer to 435 adults who were treated for
third-degree burns by the University of Southern California General Hospital
Burn Center. The patients were grouped according to the area of third-degree
burns on the body. (The groups are identified as midpoints of set intervals
of log(area +1).) For each patient, it was recorded whether or not they
survived, and the area of their burn was recorded as the midpoint of the
group corresponding to their burn.
Number of observations: 435
| Variable |
Description |
| midpoint |
Midpoint of the group corresponding to the patients burn. |
| survive |
Binary variable: survived=1, died=0 |
| |
|
Source: Fan, J., Heckman, N.E. and Wand, M.P. (1995) Local
polynomial kernel regression for generalised linear models and
quasi-likelihood functions, Journal of the American Statistical
Association, 90, pp. 141-50.
Data: convict.txt
Keywords: Logistic regression.
Description: A study was carried out on factors related to a
criminal conviction after treatment for drug abuse. For each of sixty
people, who had taken part in a drug rehabilitation programme, it was
recorded whether they had a `short' education (15 years, or less) or a
`long' education (more than 15 years). Also, it was recorded whether or
not they had a post-treatment conviction.
Number of observations: 60
| Variable |
Description |
| education |
Categorical variable identifying length of education (more than
15 years=1, 15 years or less=0) |
| convicted |
Binary variable for post-treatment conviction: convicted=1,
not-convicted=0 |
| |
|
Source: Wilson, S. and Mandelbrote, B. (1978) Drug rehabilitation
and criminality, British J. Criminology, 18, pp. 381-386.
Data: cows.txt
Keywords: Logistic regression.
Description: An experiment was carried out to investigate the
effect of small electrical currents on farm animals. The eventual goal was
to understand the effects of high-voltage powerlines on livestock. The
experiment was carried out with 7 cows, and 6 shock intensities: 0, 1, 2, 3,
4 and 5 milliamps. (Shocks on the order of 15 milliamps are painful for many
humans.) Each cow was given 30 shocks, five at each intensity, in random
order. The entire experiment was then repeated, so each cow received a total
of 60 shocks. For each shock, the response, mouth movement, was either
present or absent. (These data are the same as the data in cows2.txt.)
Number of observations: 420
| Variable |
Description |
| current |
Shock intensity (in milliamps) |
| movement |
Binary variable: movement=1, no-movement=0 |
| |
|
Source: Weisberg, S. (1985) Applied Linear Regression,
Wiley.
Data: cows2.txt
Keywords: Logistic regression.
Description: An experiment was carried out to investigate the
effect of small electrical currents on farm animals. The eventual goal was
to understand the effects of high-voltage powerlines on livestock. The
experiment was carried out with 7 cows, and 6 shock intensities: 0, 1, 2, 3,
4 and 5 milliamps. (Shocks on the order of 15 milliamps are painful for many
humans.) Each cow was given 30 shocks, five at each intensity, in random
order. The entire experiment was then repeated, so each cow received a total
of 60 shocks. For each shock, the response, mouth movement, was either
present or absent.
It was thought that the reactions from the cows would depend on the shock
intensity, but also, that it might differ slightly between the first
experiment and the repeated experiment, due to fatigue of the animals, or
due to learning. (These data are the same as the data in cows.txt.)
Number of observations: 420
| Variable |
Description |
| current |
Shock intensity (in milliamps) |
| movement |
Binary variable: movement=1, no-movement=0 |
| trial |
Categorical variable identifying the trial (1= first experiment, 2=
repeated experiment) |
| |
|
Source: Weisberg, S. (1985) Applied Linear Regression,
Wiley.
Data: insecticides.txt
Keywords: Logistic regression.
Description: In a trial of three insecticides, batches of about
fifty insects were exposed to varying deposits of each insecticide.
Number of observations: 882
| Variable |
Description |
| killed |
Binary variable: killed=1, not-killed=0 |
| insecticide |
Categorical variable identifying insecticide (numbered 1 to 3) |
| deposit |
Amount of deposit (in milligrams) |
| |
|
Source: Krzanowski, W.J. (1998) An Introduction to
Statistical Modelling, London: Arnold. pp. 198-9.
Data: pill.txt
Keywords: Logistic regression.
Description: The link between use of an oral contraceptives and the
incidence of myocardial infarction was investigated. For each of 224 women,
it was recorded whether or not they were using the oral contraceptive and
whether or not they suffered a myocardial infarction.
Number of observations: 224
| Variable |
Description |
| infarction |
Binary variable: infarction=1, no-infarction=0 |
| pill |
Categorical variable for whether or not the pill is used (using
pill=1, not using pill=0) |
| |
|
Source: Mann, J.I., Vesey, M.P., Thorogood, M. and Doll, R. (1975)
British J. Medicine, 2, 241-245.
Data: sirds.txt
Keywords: Logistic regression.
Description: This data set contains the birth weights of fifty
infants who exhibited severe idiopathic respiratory distress syndrome
(SIRDS). This is a serious condition that may result in death, and in fact
of the fifty children sampled only 23 survived.
Number of observations: 50
| Variable |
Description |
| birthweight |
Weight at birth (in kg) |
| survival |
Binary variable: survived=1, died=0 |
| |
|
Source: van Vliet, P.K. and Gupta, J.M. (1973) Sodium bicarbonate
in idiopathic respiratory distress syndrome, Archives of Disease in
Childhood, 48, pp. 249-255.
Data: snoring.txt
Keywords: Logistic regression.
Description: A study was undertaken to investigate whether snoring
is related to a heart disease. In the survey, 2484 people were classified
according to their proneness to snoring (never, occasionally, often, always)
and whether or not they had the heart disease.
Number of observations: 2484
| Variable |
Description |
| disease |
Binary variable: having disease=1, not having disease=0 |
| snoring |
Categorical variable indicating level of snoring (never=1,
occasionally=2, often=3 and always=4) |
| |
|
Source: Norton, P.G. and Dunn, E.V. (1985) Snoring as a risk factor
for disease: an epidemiological survey, British Medical Journal,
291, pp. 630-632.
Data: tonsil.txt
Keywords: Logistic regression.
Description: Some individuals are carriers of the bacterium
Streptococcus pyogenes. An investigation was made into the possible
relationship between carrier status and tonsil size in schoolchildren. A
total of 1398 children were examined and classified according to tonsil size
(normal, large and very large) and to whether or not they were carriers.
Number of observations: 1398
| Variable |
Description |
| carrier |
Binary variable: carrier=1, no-carrier=0 |
| tonsil |
Categorical variable indicating tonsil size (normal=1, large=2 and
very large=3) |
| |
|
Source: Krzanowski, W. (1988) Principles of multivariate
analysis, Oxford University Press, Oxford, p. 269.
Data: vaso.txt
Keywords: Logistic regression.
Description: A study was made into the effect of volume and rate of
air inspired by human subjects on the occurrence of transient
vasoconstriction in the skin of the fingers. A total of 39 observations were
obtained on these variables from 3 subjects in a laboratory. The data are
assumed to be independent (including those on the same subject).
Number of observations: 39
| Variable |
Description |
| volume |
Volume of air inspired by subject. |
| rate |
Rate of air inspired by subject. |
| survive |
Binary variable: occurrence of transient vasoconstriction in the
skin of the fingers=1, no-occurence=0 |
| |
|
Source: Krzanowski, W.J. (1998) An Introduction to
Statistical Modelling, London: Arnold. pp. 201-2.
Data: braindom.txt
Keywords: One-way ANOVA.
Description: A study was made into how different kinds of brain
dominance (left-brained, right-brained or integrative (=both)) affect the
ability to recall information of various types. These data refer to an
experiment in which subjects were asked to recall information presented to
them in tabular form about he numbers of doctors practising in various US
states. The subjects were divided into three groups, depending on whether
they were predominately left-brained (active, verbal, logical; Group 1),
right-brained (receptive, spatial, intuitive; Group 2) or integrative (both,
Group 3).
Number of observations: 24
| Variable |
Description |
| score |
Score in recall test |
| brain |
Type of brain dominance (Group 1: left, Group 2: right, Group 3:
both) |
| |
|
Source: Brown, T.S. and Evans, J.K. Muller, (1986) Hemispheric
dominance and recall following graphical and tabular presentation of
information Proceedings of the 1986 Annual Meeting of the Decision
Sciences Institute, 1, p. 598.
Data: cardiac.txt
Keywords: One-way ANOVA.
Description: Data were collected from an experiment designed to
compare the relative potencies (dosages at death) of four cardiac
substances. A suitable dilution of one of the substances was slowly infused
into an anesthetised guinea pig, and the dosage at which the guinea pig died
was recorded.
Number of observations: 40
| Variable |
Description |
| potency |
Dosage at which the guinea pig died from substance |
| substance |
Categorical variable identifying the substance |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 445.
Data: cereals.txt
Keywords: Factorial experiment, unequal cell numbers.
Description: A manufacturer conducted a pricing experiment to
explore the effects of price decreases on sales of one of its breakfast
cereals. The two largest supermarket chains in a particular marked
participated in the experiment. Ten stores from each chain were randomly
selected, and each store was assigned a price level for the cereal (either
the original price, or a 10% reduced price). If the competing chain had a
store in the same vicinity, the two stores both were assigned the same price
level. Some stores failed to complete the experiment due to competition from
other supermarket chains. Sales volumes over the period of the study were
noted for each of the 17 stores completing the experiment.
Number of observations: 17
| Variable |
Description |
| sales |
Sales volumes (in hundreds of units) |
| chain |
Categorical variable identifying the chain (numbered 1 and 2) |
| price |
Categorical variable identifying the price level (original price=1,
reduced price=2) |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 583.
Data: chickfeed.txt
Keywords: Randomised complete block design.
Description: In an experiment, a drug was added to the feed of
chicks in an attempt to promote growth. The aim of the experiment was to
compare the effects of the three types of feed: standard feed (a control),
standard feed added a low dose of the drug, and standard feed added a high
dose of the drug. It was thought that the position of the chicks in the bird
house may influence the growth of the chicks as well (due to variation in
lighting and ventilation, etc.), so the chicks in the experiment were
grouped in eight blocks according to the location in the bird house. Each
type of feed was fed to one unit (of each three chicks) from each block.
Within each block, the three units were randomly allocated to the three
types of feed. When the chicks had matured, the average weight per chick in
each unit was recorded.
Number of observations: 24
| Variable |
Description |
| weight |
Average weight per chick in unit (in pounds) |
| feed |
Type of feed (standard=1, low-dose=2, high-dose=3) |
| position |
Position of chick-unit (numbered 1 to 8) |
| |
|
Source: S.M. Free in Snee, R.D. (1985) Graphical display of results
of three treatments randomized block experiment, Applied Statistics,
34, pp. 71-7.
Data: coalseam.txt
Keywords: One-way ANOVA.
Description: A study was made to compare the sulphur content of the
five major coal (numbered 1 to 5, respectively) seams in a particular region.
Number of observations: 42
| Variable |
Description |
| sulphur |
Sulphur content |
| seam |
Factor: level corresponds to coal seam label |
| |
|
Source: Milton, J.S. & Arnold, J.C. (1995) Introduction to
probability and statistics, McGraw Hill International Editions, p. 539.
Data: diabetes.txt
Keywords: One-way ANOVA.
Description: In an experiment, three groups of mice (Group 1:
normal mice treated with a placebo, Group 2: alloxan-diabetic mice treated
with a placebo, Group 3: alloxan-diabetic mice treated with insulin) were
injected with 5 mg BSA antigen on days 0 and 28. On day 39, the amount of
nitrogen-bound bovine serum albumen produced by the mice was measured.
Number of observations: 57
| Variable |
Description |
| antibody |
Amount of nitrogen-bound bovine serum albumen produced by mouse
(in micrograms per ml of undiluted mouse serum) |
| group |
Categorical variable identifying group (normal/placebo=1,
diabetic/placebo=2, diabetic/insulin=3) |
| |
|
Source: Dolkart, R.E., Halpern, B. and Perlman, J. (1971)
Comparison of antibody responses in normal and alloxan diabetic mice, Diabetes, 20, pp. 162-167.
Data: dopamine.txt
Keywords: Two-sample
-test.
Description: In a study into the causes of schizophrenia, 25
hospitalized schizophrenic patients were treated with anti-psychotic
medication. After a period of time, they were classified as psychotic or
non-psychotic. Samples of cerebrospinal fluid were taken from each patient
and tested for dopamine b-hydroxylase enzyme activity.
Number of observations: 25
| Variable |
Description |
| dopamine |
Dopamine b-hydroxylase enzyme activity in patient |
| state |
Indicator variable: 1 if psychotic, 2 non-psychotic |
| |
|
Source: Sternberg, D.E., van Kammen, D.P. and Bunney, W.E. (1982)
Schizophrenia: dopamine b-hydroxylase activity and treatment response, Science, 216, pp. 1423-1425.
Data: doughnuts.txt
Keywords: One-way ANOVA.
Description: These data concern the amount of fat absorbed by
doughnuts when cooked. For each of four different types of fat (numbered 1
to 4, respectively), six batches of each 24 doughnuts were cooked. The total
amount of fat absorbed by each batch was recorded.
Number of observations: 24
| Variable |
Description |
| absorb |
Total amount of fat absorbed by batch (in grams) |
| fat |
Categorical variable identifying the type of fat |
| |
|
Source: Snedecor, G.W. and Cochran, W.G. (1967) Statistical
Methods, 6th edition, Ames (IA), Iowa State University Press.
Data: insects.txt
Keywords: One-way ANOVA.
Description: A study was made to compare the effectiveness of six
different insecticides (numbered 1 to 6, respectively). For each
insecticide, twelve batches of 50 insects were exposed to the insecticide
for a fixed length of time. The numbers of insects in the batches still
alive (of the 50) after the exposure time were recorded.
Number of observations: 72
| Variable |
Description |
| alive |
Number of insects in the batches still alive after the exposure time |
| insecticide |
Categorical variable identifying the insecticide |
| |
|
Source: Lunneborg, C.E. (1994) Modeling Experimental and
Observational Data, Duxbury Press, Ca., p. 150.
Data: iron.txt
Keywords: Transformations, ANOVA.
Description: An experiment was performed to determine whether two
forms of iron (Fe
and Fe
) are retained differently. (If one
form of iron were retained especially well, it would be the better dietary
supplement.) The investigators divided 108 mice randomly into 6 groups of 18
each; 3 groups were given Fe
in three different concentrations, 10.2,
1.2 and 0.3 millimolar, and 3 groups were given Fe
at the same three
concentrations. The mice were given the iron orally; the iron was
radioactively labeled so that a counter could be used to measure the initial
amount given. At a later time, another count was taken for each mouse, and
the percentage of iron retained was calculated.
Number of observations: 108
| Variable |
Description |
| retain |
Percentage of iron retained |
| iron |
Categorical variable identifying iron form (Fe
=1 and Fe
=2) |
| concentration |
Categorical variable identifying concentration (levels 1,2
and 3 correspond |
| |
to 10.2, 1.2 and 0.3 millimolar, respectively) |
| |
|
Source: Rice, J.A. (1988), Mathematical Statistics and Data
Analysis Wadsworth & Brooks/Cole, p.357.
Data: laboratory.txt
Keywords: One-way ANOVA.
Description: A large number of laboratories are regularly used to
measure the amount of toxic substances in various materials. There is
concern that results not only vary due to normal measurement variability,
but that there may be substantial variability due to different laboratory
techniques. If true, this might raise a need for enforcing one `standard'
procedure for all laboratories. To test this concern, four laboratories were
randomly selected and asked to measure the content of a certain chemical.
Each laboratory was given six identical samples for testing.
Number of observations: 24
| Variable |
Description |
| chemical |
Measured content of chemical (in parts per million) |
| laboratory |
Categorical variable identifying the laboratory |
| |
|
Source: Milton, J.S. & Arnold, J.C. (1995) Introduction to
probability and statistics, McGraw Hill International Editions, p. 593.
Data: oysters.txt
Keywords: One-way ANOVA.
Description: It was known that a toxic material was dumped in a
river leading into a large salt water commercial fishing area. The way the
water carried the toxic material was studied by measuring the amount of the
toxic material (in parts per million) found in oysters harvested at three
different locations, ranging from the estuary out into the bay where her
majority of commercial fishing was carried out.
Number of observations: 24
| Variable |
Description |
| toxic |
Toxic material in oysters |
| site |
Site at which the oysters were harvested |
| |
|
Source: Milton, J.S. & Arnold, J.C. (1995) Introduction to
probability and statistics, McGraw Hill International Editions, p. 585.
Data: soya.txt
Keywords: One-way ANOVA.
Description: In an experiment on the effect of stress on the growth
of soya beans, 52 soya beans were grown under different types of stress.
Some were shaken for twenty minutes every day (Group 4), some were grown in
semi-darkness (Group 1), some were grown in semi-darkness and shaken for
twenty minutes each day (Group 2). A control group was grown without any
imposed stress (Group 3). After sixteen days of growth, the leaf area of
each plant was measured. (These data are the same as the data in soya2.txt.)
Number of observations: 52
| Variable |
Description |
| leafarea |
Leaf area after 16 days |
| stress |
Categorical variable identifying the group |
| |
|
Source: Blæsild, P. and Granfeldt, J., (1995), Statistik for
biologer og geologer, Det Naturvidenskabelige Fakultet, University of
Aarhus, Denmark.
Data: voltmeter.txt
Keywords: One-way ANOVA.
Description: A utility company has a large stock of voltmeters that
are used interchangeably by many employees. A study is conducted to detect
differences among the average readings given by these voltmeters. If it
appears that differences do exist, then all the meters in stock will be
calibrated. A random sample of six meters is selected from stock and four
readings are taken for each meter. The response variable is the difference
between the meter reading and the known voltage being applied at the time of
the reading.
Number of observations: 24
| Variable |
Description |
| reading |
Difference between the meter reading and the known voltage |
| voltmeter |
Categorical variable identifying the voltmeter |
| |
|
Source: Milton, J.S. & Arnold, J.C. (1995) Introduction to
probability and statistics, McGraw Hill International Editions, p. 575.
Data: adhesive.txt
Keywords: Two-way ANOVA.
Description: An experiment was made to investigate the effect of
temperature and humidity on the force required to separate an adhesive
product from a certain material. Four specific temperatures and two specific
humidities were of interest in the experiment. (Thus, the factors are
fixed!)
Number of observations: 24
| Variable |
Description |
| force |
Force required to separate the adhesive product from the material |
| temperature |
Categorical variable identifying the temperature level |
| humidity |
Categorical variable identifying the humidity level |
| |
|
Source: Milton, J.S. & Arnold, J.C.: "Introduction to probability
and statistics" (1995), McGraw Hill International Editions, p.607.
Data: ads.txt
Keywords: Two-way ANOVA.
Description: An advertising company evaluated three types of
television advertisements for a new, low-cost car: visual appeal ads, budget
appeal ads, and feature appeal ads. To control for age differences, viewers
from four age groups were chosen to evaluate the persuasiveness of the ads
(as measured on a scale from 1 to 10, where 1 represented the lowest level
of persuasion, and 10 the highest). For each type of advertisement, two
viewers from each age group were asked to evaluate the ad.
Number of observations: 24
| Variable |
Description |
| score |
Score of the ad |
| type |
Categorical variable identifying the ad type (Visual=1, Budget=2,
Feature=3) |
| age |
Categorical variable identifying the viewer's age (`18-25'=1, `
26-35'=2, `36-45'=3, |
| |
`46 and older'=4) |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 557.
Data: dye.txt
Keywords:
factorial experiment.
Description: A study was conducted on the effect of temperature,
time in process and rate of temperature rise on the amount of dye left in
the residue bath after a dying process. The experiment was run at two levels
of temperature (120C, 135C), two levels of time in process (30 minutes, 60
minutes) and two levels of rate of temperature rise (
,
). The
experiment was run as a
factorial experiment with two replications.
Number of observations: 16
| Variable |
Description |
| dye |
Amount of dye left in the residue bath (in milligrams) |
| temp |
Categorical variable identifying the temperature level (120C=1,
135C=2) |
| time |
Categorical variable identifying the time in process (30 mins=1, 60
mins=2) |
| rate |
Categorical variable identifying the rate of temperature rise (
=1,
=2) |
| |
|
Source: Milton, J.S. & Arnold, J.C. (1995) Introduction to
probability and statistics, McGraw Hill International Editions, p. 629.
Data: experience.txt
Keywords: Randomised complete block design.
Description: A study was made to investigate the effect of
experience on the average time required to complete an assembly task on an
assembly line. (If experience is found to have an effect, a training program
will be set up for new employees.) For each of eight different assembly
tasks, four employees with 1,2,3 and 4 years of experience, respectively,
were randomly selected to complete the task. The times it took to complete
the tasks were recorded. The experiment was set up as a randomised complete
block design with tasks as blocks and years of experience as factor levels.
Number of observations: 32
| Variable |
Description |
| time |
Time it takes to complete the assembly task |
| experience |
Number of years of experience |
| task |
Categorical variable identifying the task (numbered 1 to 8) |
| |
|
Source: Milton, J.S. & Arnold, J.C. (1995) Introduction to
probability and statistics, McGraw Hill International Editions, p. 591.
Data: influenza.txt
Keywords: Two-way ANOVA.
Description: In a laboratory working on influenza virus, there were
three different operators of photoelectric titration equipment and two
different methods of performing the titration. The methods involve several
dilutions of the virus preparation; either a single pipette could be used
for every dilution, or a fresh pipette could be used for each dilution. What
was being studied was not the virus preparations themselves, but the
operators and the measurement methods. In fact, apart from measurement
variability caused by having different operators and different methods,
there was no reason for the responses to differ, since all the measurements
were made on samples drawn from the same virus preparation.
Number of observations: 24
| Variable |
Description |
| measure |
Titration measurements |
| operator |
Categorical variable identifying operator (numbered 1 to 3) |
| pipette |
Categorical variable relating to pipettes: single pipette=1,
multiple pipettes=2 |
| |
|
Source: Osborn, J.F. (1979) Statistical Exercises in Medical
Research, Oxford, Blackwell Scientific Publications.
Data: leafpack.txt
Keywords: Two-way ANOVA.
Description: Decomposition of leaf packs was measured (in terms of
weight loss of the leaf packs) in four different environments after 1, 2 and
3 months of exposure.
Number of observations: 24
| Variable |
Description |
| decomp |
Weight loss of leaf pack (in grams) |
| environment |
Categorical variable identifying the environment (numbered 1
to 4) |
| time |
Categorical variable identifying the time length (1, 2 or 3 months,
respectively) |
| |
|
Source: Milton, J.S. & Arnold, J.C. (1995) Introduction to
probability and statistics, McGraw Hill International Editions, p.639.
Data: phone.txt
Keywords: ANOVA, unequal cell numbers.
Description: The manager of a market research company conducted an
experiment to investigate the productivity of three employees on each of two
computerised data-entry systems. The employees conducted phone surveys,
entering the survey data into the computer during the phone call.
Productivity was measured as the time taken to complete a call in which the
respondent agreed to complete the survey. Each employee used each system for
one hour, and the order of use was randomised.
Number of observations: 51
| Variable |
Description |
| time |
Time taken to complete call (in minutes) |
| employee |
Categorical variable identifying employee (numbered 1 to 3) |
| system |
Categorical variable identifying system (numbered 1 and 2,
respectively) |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 584.
Data: pinetree.txt
Keywords: Mixed-effect two-way ANOVA.
Description: The diameters of three species of pine trees were
compared at each of four locations using samples of five trees per species
at each location.
Number of observations: 60
| Variable |
Description |
| diameter |
Diameter of pine tree |
| species |
Categorical variable identifying the species (numbered 1 to 3) |
| location |
Categorical variable identifying the location (numbered 1 to 4) |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 552.
Data: pistons.txt
Keywords: Randomised complete block design.
Description: These data concern the number of failures of piston
rings for each of three legs of four steam-driven compressors.
Number of observations: 12
| Variable |
Description |
| piston |
Numbers of failures of piston rings |
| leg |
Position of leg (North=1, Centre=2, South=3) |
| compressor |
Categorical variable identifying compressor (numbered 1 to 4) |
| |
|
Source: Davies, O.L. and Goldsmith, P.L. (1972) Statistical
Methods in Research and Production, 4th edn. Oliver and Boyd, Edinburgh, p.
324.
Data: protein.txt
Keywords: Two-way ANOVA.
Description: Six groups, each of ten rats, were fed on diets which
differed according to source of protein and amount of protein in diet. The
weight gain for each rat was recorded.
Number of observations: 60
| Variable |
Description |
| weight |
Weight gain (in grams) |
| protein |
Protein source: Beef=1, Cereal=2, Pork=3 |
| amount |
Amount of protein: Low=1, High=2 |
| |
|
Source: Snedecor, G.W. and Cochran, G.C. (1967) Statistical
Methods, 6th edn. Iowa State University Press, p. 347.
Data: sexism.txt
Keywords: Two-way ANOVA.
Description: A study was conducted to compare the sexist attitudes
of students at various types of colleges in the US. The colleges-types are:
mixed (gender) college with at least 75% male students, mixed college with
less than 75% male students, and single sex college. For each gender,
random samples of each 10 undergraduate students were selected from each of
the three types of colleges. Each student filled in a questionnaire, from
which a score for `degree of sexism'-defined as the extent to which a
student considered males and females to have diffeerent life roles-was
determined (the higher the score, the more sexist the attitude).
Number of observations: 60
| Variable |
Description |
| sexism |
Score for `degree of sexism' |
| type |
Categorical variable identifying the college type (mixed with
males=1, |
| |
mixed with
males=2, single sex=3) |
| gender |
Categorical variable identifying the student's gender (male=1,
female=2) |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 549.
Data: soya2.txt
Keywords: Two-way ANOVA.
Description: In an experiment on the effect of stress on the growth
of soya beans, 52 soya beans were grown under two different types of stress:
Some were shaken for twenty minutes every day, some were grown in
semi-darkness. After sixteen days of growth, the leaf area of each plant was
measured. (These data are the same as the data in soya.txt.)
Number of observations: 52
| Variable |
Description |
| leafarea |
Leaf area after 16 days |
| shake |
Categorical variable identifying the shaking-stress (1=shaken, 0=
non-shaken) |
| light |
Categorical variable identifying the semidarkness-stress
(1=semidarkness, 0= normal light) |
| |
|
Source: Blæsild, P. and Granfeldt, J., (1995), Statistik for
biologer og geologer, Det Naturvidenskabelige Fakultet, University of
Aarhus, Denmark.
Data: stressred.txt
Keywords: Two-way ANOVA.
Description: An experiment was made to investigate whether the
drugs levorphanol and/or epinephrine reduce stress. Each treatment
(Treatment 1: levorphanol, Treatment 2: levorphanol and epinephrine,
Treatment 3: epinephrine, and Treatment 4: a control group, receiving
neither drug) was given to five animals, and the cortical sterone level
(which reflects the stress-level) was measured.
Number of observations: 20
| Variable |
Description |
| level |
Level of cortical sterone |
| levor |
Indicating presence (1) or absence (0) of levorphanol in treatment |
| epine |
Indicating presence (1) or absence (0) of epinephrine in treatment |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 545.
Data: tattoos.txt
Keywords: Multiway ANOVA.
Description: These data concern patients who have had forearm
tattoos removed by one of two different surgical methods (labelled A and B,
respectively). The gender of each patient was recorded, as well as the size
(small, medium or large) and the depth (moderate or deep) of the tattoos.
The quality of the result was scored from 1 (poor) to 4 (excellent).
Number of observations: 55
| Variable |
Description |
| method |
Method used to remove tattoo (A or B) |
| gender |
Patient's gender (m, f) |
| size |
Size of tattoo (small, medium, large) |
| depth |
Depth of tattoo (moderate, deep) |
| score |
Quality of result (1 to 4) |
| |
|
Source: Lunn, A.D. and McNeil, D.R. (1988) The SPIDA manual.
Statistical Computing Laboratory.
Data: tyres.txt
Keywords: Randomised complete block design.
Description: A small bus company wanted to evaluate the wear of
four types of tyres. Since each of the company's five buses runs a different
route with terrain and driving conditions, the company decided to place one
of each type of tyre on each of the buses (choosing the wheel positions
randomly).
Number of observations: 20
| Variable |
Description |
| wear |
Wear of tyre |
| tyre |
Type of tyre (numbered 1 to 4) |
| bus |
Specifying the bus (numbered 1 to 5) |
| |
|
Source: Milton, J.S. & Arnold, J.C. (1995) Introduction to
probability and statistics, McGraw Hill International Editions, p. 561.
Data: uric.txt
Keywords: Two-way ANOVA.
Description: These data are the uric acid level found in the
bloodstreams of persons with Down's syndrome, and in the bloodstreams of
non-Down's syndrome subjects. All subjects were between the ages 21 and 25
years.
Number of observations: 20
| Variable |
Description |
| uric |
Uric acid level in bloodstream |
| down |
Categorical variable relating to Down's syndrome (persons with Down's
syndrome=1, other=2) |
| gender |
Categorical variable identifying the person's gender (male=1,
female=2) |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 551.
Data: victim.txt
Keywords: Factorial experiment, unequal cell numbers.
Description: A crime victimisation study was undertaken in a
medium-size southern US-city. The main purpose was to determine the effects
of being a crime victim on confidence in the law enforcement authority and
in the legal system itself. A questionnaire was administered to a random
sample of 40 city residents. Among the information elicited were data on the
number of times the resident has been vicimised, a measure of social class
status, and a measure of the respondent's confidence in law enforcement and
in the legal system.
Number of observations: 40
| Variable |
Description |
| confidence |
Measure of confidence in law enforcement and legal system |
| victim |
Number of times resident has been victimised (0, 1 or 2+) |
| class |
Categorical variable identifying social class status (Low=1,
Medium=2, High=3) |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 576.
Data: worry.txt
Keywords: Factorial experiment, unequal cell numbers.
Description: These data concern a study of the satisfaction with
medical care of pregnant women. The patients were classified according to
two factors: patient worry (positive or negative), and the affectiveness of
physician-patient communication (High, Medium or Low). The variables were
developed from scales based on questionnaires administered to patients and
their physicians.
Number of observations: 50
| Variable |
Description |
| satisfy |
Satisfaction score (between 1 and 10) |
| worry |
Categorical variable identifying the level of worry (Negative=1,
Positive=2) |
| commun |
Categorical variable identifying affectiveness of communication
(High=1, Medium=2, Low=3) |
| |
|
Source: Kleinbaum, D.G., Kupper, L.L., Muller, K.E. and Nizam, A.,
(1998), Applied Regression Analysis and Other Multivariable Methods
Duxbury Press, Brooks/Cole, p. 562.
|