Presentation of data in an orderly manner often calls for a graphic display.
Nowadays it is easier, with the advent of graphics programs for the computer,
but still requires the application of basic techniques.
The first consideration for a graph, is whether the graph is needed, and if
so, the type of graph to be used. For accuracy, a well-constructed table of data
usually provides more information than a graph. The values obtained and their
variability are readily apparent in a table, and interpolation (reading the graph)
is unnecessary. For visual impact, however, nothing is better than a graphic
There are a variety of graph types to be chosen from; e.g., line graphs, bar
graphs, and pie graphs. Each of these has its own characteristics and subdivisions.
One also has to decide upon singular or multiple graphs, 2-dimensional
or 3-dimensional displays, presence or absence of error bars, and the
aesthetics of the display. The latter include such details as legend bars, axis
labels, titles, selection of the symbols to represent data, and patterns for bar
a 2-dimensional graph, with 2 values (x and y), which value is x and which
is y? The answer is always the same—the known value is always the ordinate
(x) value. The value that is measured is the abscissa (y) value. For a standard
curve of absorption in spectrophotometry, the known concentrations of the
standards are placed on the x-axis, while the measured absorbance would be
on the y-axis. For measurements of the diameter of cells, the x-axis would be
a micron scale, while the y-axis would be the number of cells with a given
Unless you are specifically attempting to demonstrate an inverted function,
the scales should always be arranged with the lowest value on the left of the
x-axis, and the lowest value at the bottom of the y-value. The range of each
scale should be determined by the lowest and higest value of your data, with
the scale rounded to the nearest tenth, hundredth, thousandth, etc. That is, if
the data range from 12 to 93, the scale should be from 10 to 100. It is not
necessary to always range from 0, unless you wish to demonstrate the relationship
of the data to this value (spectrophotometric standard curve).
The number of integrals placed on the graph will be determined by the
point you wish to make, but in general, one should use about 10 divisions of
the scale. For our range of 12 to 93, an appropriate scale would be from 0 to
100, with an integral of 10.
Placing smaller integrals on the scale does not convey more information, but merely adds a lot of confusing marks to the
graph. The user can estimate the values of 12 and 93 from such a scale without
having every possible value ticked off.
Line Graph vs. Bar Graph or Pie Graph
If the presentation is to highlight various data as a percentage of the total data,
then a pie graph is ideal. Pie graphs might be used, for example, to demonstrate
the composition of the white cell differential count. They are the most often used
graph type for business, particularly for displaying budget details.
Pie graphs are circular presentations that are drawn by summing your data
and computing the percent of the total for each data entry. These percent values
are then converted to portions of a circle (by multiplying the percent by 360°)
and drawing the appropriate arc of a circle to represent the percent. By connecting
the arc to the center point of the circle, the pie is divided into wedges, the
size of which demonstrate the relative size of the data to the total. If one or more
wedges are to be highlighted, that wedge can be drawn slightly out of the
perimeter of the circle for what is referred to as an “exploided” view.
More typical of data presented in cell biology, however, are the line graph
and the bar graph. There is no hard and fast rule for choosing between these
graph types, except where the data are noncontinuous. Then, a bar graph must
be used. In general, line graphs are used to demonstrate data that are related
on a continuous scale, whereas bar graphs are used to demonstrate discontinuous
or interval data.
Suppose, for example, that you decide to count the number of T-lymphocytes
in 4 slices of tissue, one each from the thymus, Payer’s patches, a lymph
node, and a healing wound on the skin. Let’s label each of these as T, P, L,
and S, respectively. The numbers obtained per cubic centimeter of each tissue
are T = 200, P = 150, L = 100, and S = 50. Note that there is a rather nice linear
decrease in the numbers if T is placed on the left of an x-axis, and S to the right.
A linear graph of these data would produce a nice straight line, with a statistical
regression fit and slope. But look at the data! There is no reason to place
T (or P, L, or S) to the right or left of any other point on the graph—the
placement is totally arbitrary. A line graph for these data would be completely
misleading since it would imply that there is a linear decrease from the thymus
to a skin injury and that there was some sort of quantitative relationship among
the tissues. There is certainly a decrease, and a bar graph could demonstrate
that fact, by arranging the tissue type on the x-axis in such a way to demonstrate
that relationship—but there is no inherent quantitative relationship
between the tissue types that would force one and only one graphic display.
Certainly, the thymus is not 4 times some value of skin (although the numbers
However, were you to plot the number of lymphocytes with increasing
distance from the point of a wound in the skin, an entirely different presentation
would be called for. Distance is a continuous variable. We may choose to
collect the data in 1-mm intervals, or 1 cm. The range is continuous from 0 to
the limit of our measurements. That is, we may wish to measure the value at
1 mm, 1.2 mm, 1.23 mm, or 1.23445 mm. The important point is that the 2-mm
position is 2x the point at 1 mm. There is a linear relationship between the
values to be placed on the x-axis. Therefore, a linear graph would be appropriate,
with the dots connected by a single line. If we choose to ignore the 1.2 and
1.23 and round these down to a value of 1, then a bar graph would be more
appropriate. This latter technique (dividing the data in appropriate intervals
and plotting as a bar graph) is known as a histogram.
Having decided that the data have been collected as a continuous series,
and that they will be plotted on a linear graph, there are still decisions to be
made. Should the data be placed on the graph as individual points with no
lines connecting them (a scattergram)? Should a line be drawn between the
points (known as a dot-to-dot)? Should the points be plotted, but curve smoothing
be applied? If the latter, what type of smoothing?
There are many algorithms for curve fitting, and the 2 most commonly used
are linear regression and polynomial regression. It is important to decide before
graphing the data, which of these is appropriate.
Linear regression is used when there is good reason to suspect a linear
relationship within the data (for example, in a spectrophotometric standard
following the Beer-Lambert law). In general, the y-value can be calculated from
the equation for a straight line, y = mx + b, where m is the slope and b is the
Computer programs for this can be very misleading. Any set of data can
be entered into a program to calculate and plot linear regression. It is important
that there be a valid reason for supposing linearity before using this function,
however. This is also true when using polynomial regressions. This type of
regression calculates an ideal curve based on quadratic equations with increasing
exponential values, that is y = (mx + b)n, where n is greater than 1. The
mathematics of this can become quite complex, but often the graphic displays
look better to the beginning student. It is important to note that use of polynomial
regression must be warranted by the relationship within the data, not by
the individual drawing the graph.
For single sets of data, that is the extent of the available options. For
multiple sets, the options increase. If the multiple sets are data collected pertaining
to identical ordinate values, then error bars (standard deviation or standard
error of the means) can be added to the graphics. Plots can be made where 2
lines are drawn, connecting the highest y-values for each x, and a second
connecting the lowest values (the Hi-Lo Graph). The area between the 2 lines
presents a graphic depiction of variability at each ordinate value.
If the data collected involve 2 or more sets with a common x-axis, but
varying y-axis (or values), then a multiple graph may be used. The rules for
graphing apply to each set of data, with the following provision: keep the
number of data sets on any single graph to an absolute minimum. It is far better
to have 3 graphs, each with 3 lines (or bars), than to have a single graph with
9 lines. A graph that contains an excess of information (such as 9 lines) is
usually ignored by the viewer (as are tables with extensive lists of data). For
this same reason, all unnecessary clutter should be removed from the graph;
e.g., grid marks on the graph are rarely useful.
Finally, it is possible to plot 2 variables, y and z, against a common value,
x. This is done with a 3D graphic program. The rules for designing a graph
follow for this type of graph, and the use of these should clearly be left to
computer graphics program. These graphs often look appealing with their hills
and valleys, but rarely impart any more information than 2 separate 2D graphs.
Perhaps the main reason is that people are familiar with 2-dimensional graphs,
but have a more difficult time visually interpreting 3-dimensional graphs.