Collecting Good Data: Making Sense of Data Chapter 3

Chapter 2 discusses how to visualize processes and systems. Chapter 3 talks about some issues in collecting data so that we can avoid them when analyzing systems. The chapter mainly consists of three parts: general requirements to collect good data, definitions that are operational, and some common pitfalls in interpreting data collected.

For the first part, the author provides some general questions to be asked when collecting data. I think those questions can help us understand the context of data collected which is important in interpretation.

  • What is the purpose of collecting data? (Data collected for one purpose may not be enough for another purpose. For example, the population of the world V.S. population in a city. Although the population of the world will contain data related to the population in a city. But if we really want to know the population of a city, it is a bad idea to refer to the world’s population since it is not in the accuracy we want.)
  • Who is going to use the data? (On one hand, if there is no one using the data, and you are not required to keep and report the measure, then it should be dropped. On the other hand, knowing who is going to use the data can give you an idea of the purpose of collecting data.)
  • How are the values obtained? (For example, the value is about the count of people who are in the building. It looks simple but you need to understand what is meant by “in the building”, and “people”. Do you count the delivery man?)
  • What assumptions were made in obtaining values? (This is similar to the previous question with the focus on assumptions.)

The second part is about Operational Definitions, i.e. how to write a definition that is operational/ actionable. Writing definition is a very difficult task. I have personally involved in writing the so-called business dictionary where we collect all the common operational terms and try to write a definition of each of them. The output of the business dictionary, after reading the chapter, seems unsatisfactory to me. We recognized quite early that the definition we wrote cannot be used in operation. They are too brief and too general.

An operational definition consists of three parts: a criterion to be applied, a test of compliance to be applied, and a decision rule for interpreting the test results. The wordings are a bit complicated. I will rephrase them as specify the requirements to include or exclude from the thing you want to define, and a method of testing the requirements. I viewed the test and decision rule as one unit, i.e. a method of testing. For example, if I have to define what is a water bottle, I need to specify the requirements, such as features, such that something is a water bottle or not a water bottle. Then I need to develop a testing methodology to test against the specifications so that I know it is really a water bottle or not.

You may wonder why the author needs to cover “definition” in this chapter which is about Collecting Good Data. I think it is very important to define the meaning of whatever measures you have so that everyone is on the same page and the results are interpreted correctly. Writing operational definitions help other people answer the questions, How value are obtained? What assumptions were made in obtaining values?

Finally, the author covers briefly some pitfalls in interpretation. I think the last part is just a brief note. It surely does not cover extensively the potential problems you may encounter. So, I will also be brief here.

  • Some measures although they are numeric, they do not have a well-defined concept of distance. For example, cheating in school (Deduct 20 marks, e.g.) is twice as bad as being late in school (Deduct 10 marks, e.g.). Then you would rather late but not cheating or you would rather cheat if you are going to be late the second time (take a taxi maybe).
  • Pseudo-average is slippery at best. It is common in surveys that it uses numeric scale to indicate strongly disagree, disagree, neutral, agree, and strongly agree. People usually like to compute the average. You may wonder should strongly disagree twice as bad as disagree.

The author also highlighted two quotes. Although I do not find them related to the points mentioned in the previous few paragraphs. But they represent some general philosophy in collecting data so I will also quote them. “Unless all the values in a set of values are collected in a consistent manner, the values will not be comparable.” and “It is rarely satisfactory to use data which have been collected for one purpose for a different purpose.”

Subscribe to Blog via Email

If you like the content, Please Support Me By Subscribing!