# Collect data

## 18.2 Collect data

Statistics allow us to interpret what has happened in the past, so that we can predict what is likely to happen in the future – and plan for it. A vital part of working with statistics is gathering information, recording it, and then presenting it so that it can be understood easily and used by other people for making decisions. We call this process data handling.

data handling
gathering and recording information, and then presenting it in a way that can be understood easily and used by other people

The data handling cycle can be summarised as shown in the diagram. Notice that the process is a cycle, because once we have summarised, represented and analysed the results, these findings may lead to new questions that we need to research. Then we will need to start the cycle again.

### Different ways of collecting data

The first step in any statistical process is data collection, which is gathering the data. The subject that we are researching will affect the way we collect the data. Four methods of collecting data are shown in the diagram.

#### 1. Observation and measurement

Collecting data using observation or measurement involves looking at something that actually happens, and then measuring and recording it. An example of observation is counting the number of cars passing the gate of your school every hour. An example of measurement is measuring the heights of all the learners in your class.

#### 2. Interviews

An interview usually takes place between two people. One of the people is called the interviewer and the other is the interviewee or respondent. We use interviews in cases where it is possible to talk to the respondents directly. For example, we could interview people leaving a shop to find out whether they were happy with the service they received.

#### 3. Questionnaires

A questionnaire or survey is a set of questions given to people to complete. A questionnaire is useful for getting information from many people, as it can be handed out and then collected later, and does not need an interviewer. It is the best method of collecting data when you want to ask a large group of people what they think about a specific issue. The questions can consist of multiple choice questions or free-form questions to which respondents can write down their own responses.

#### 4. Databases

A database is an organised collection of data that someone else has already organised and presented. Databases can be stored on a computer or on the internet, or presented in publications such as reports, newspapers and magazines.

A population is the entire group that is being studied. A sample is a subset of the population that is selected for collecting data. We often use a sample to collect data, because collecting data from the whole population would be too big a task. It is very important that the sample is representative of the population, otherwise the data will be biased.

population
the entire group being studied
sample
a subset of the population that represents the population
bias
the over- or underestimating of an outcome in a data set

## Worked Example 18.1: Choosing an appropriate sample to collect data

Lindi wants to know what colour of mobile phone is the most popular in her school. She asks $$14$$ of her friends and records their answers.

Mobile phone colour Number of friends
Red $$2$$
Silver $$4$$
Gold $$7$$
Blue $$1$$

Use this information to answer the following questions.

• Which colour of mobile phone is the most popular?
• Do you think Lindi chose a suitable sample for gathering data?
• How could Lindi have chosen a sample that better represents the learners at her school?

From the table of values, we see that gold is the most popular colour.

### Evaluate the chosen sample.

Lindi chose $$14$$ of her friends for the sample. This sample is most likely biased, as friends can have similar styles and preferences, and they can also influence each other’s choices. It is also likely that the $$14$$ learners are all in the same grade. It also seems to be a small sample, but this would depend on the total number of learners at the school.

### Suggest a better sample for collecting data.

Lindi could have chosen at random five learners from each grade. This sample would be more representative of all the learners at her school.

