Does smoking cause lung cancer? A case on causation v.s. correlation
The debate was neither tobacco nor cancer. It was the word “cause”. Many people smoke their whole lives and never get lung cancer. Some get it without lighting up a single cigarette. Previously when we discussed RCT, we know that it is an excellent way to establish a causal relationship. However, we cannot randomly assign people to smoke for decades. Without RCT, it is hard to convince people, especially the smoking statisticians, to believe that there are no lurking third factors producing the observed correlation between smoking and lung cancer.
And we had this kind of cigarette advertisement before:
If we look at a chart plotting the per capita cigarette consumption rate in US and the lung cancer death rate among men. They show a lot of similarities. But can we say cigarette causes lung cancer?
Then scientists started to conduct many experiments:
Case-control study
- Compare patients who had already been diagnosed with cancer to a control group of healthy volunteers.
- Each group’s members where interviewd on their past behaviors and medical histories.
- Interviewers were not told who had cancer and who was a control.
Problems:
- The data told us the probablity that a cancer patient is a smoker instead of the probabiity that a smoker will get cancer.
- Recall bias: the patients knew whether they had cancer or not.
- Selection bias: the selection of cancer patients may not represent the whole smoking population.
Dose-reponse effect
- Follow a lot of smokers for a period of time
- Compare the death rate from lung cancer for heavy smokers, not so heavy smokers, and non-smokers, as well asd those who was a heavy smokers but reduce or stop smoking latter.
Problems:
- smokers are self selecting. They may be “genetically” different with non smokers.
Cornfield’s inequality
Suppose there is a confounding factor that completely accounts for the cancer risk of smokers. If smokers have nine times the risk of developing lung cancer, the confounding factor needs to be at least nine times more common in smokers to explain the difference in risk. (Maybe latter I can work out to proof in another post. For now, let’s take it for granted)
It means if 11 percent of nonsmokers have the “smoking gene”, then 99 percent of the smokers would have to have it. If 12 percent of nonsmokers have it, then it is impossible for 108 percent of smokers would have it. Therefore, the cancer gene cannot account fully for the association between smoking and cancer. Therefore, proving that smoking does cause lung cancer.
It is a remarkable achievements!
Comments ()