Originally posted by royalchickenIf you missed enough litterers because you were sniffing the previous litterer, your results would be very suspect.
You'd need two people, one to watch for litterers, and one to sniff them. If you missed enough litterers because you were sniffing the previous litterer, your results would be very suspect. Also, smokers would be overrepresented because you're more likely to be able to catch up to them.
I imagine both types of litterings occur according to an exponential distribution, which is memoryless. Even if there is a different mean governing each, does this bias the expected observed proportion? I don't think it does, but I'm a little rusty. It seems that any missed litterings that took place during the sniffing process are irrelvant to the expected observed proportion. After the sniff, you expect to see a litterer of each type according to their actual memoryless distributions, unaffected by any that you have missed during the sniff.
Also, smokers would be overrepresented because you're more likely to be able to catch up to them.
This is definitely a source of bias that I overlooked.
Originally posted by DoctorScribblesI call you out on your failure to answer my Poser on memorylessness, posted nearly a week ago.
[b]If you missed enough litterers because you were sniffing the previous litterer, your results would be very suspect.
I imagine both types of litterings occur according to an exponential distribution, which is memoryless. Even if there is a different mean governing each, does this bias the expected observed proportion? I don't think it does ...[text shortened]... to be able to catch up to them.[/b]
This is definitely a source of bias that I overlooked.[/b]
If the distribution is actually exponential, then I believe your analysis is correct. It's likely to be very nearly exponential, so your analysis is correspondingly very nearly correct.
Originally posted by DoctorScribblesThe idea that there's a cultural habit associated with some types of littering contradicts the hypothesis of memorylessness.
[b]If you missed enough litterers because you were sniffing the previous litterer, your results would be very suspect.
I imagine both types of litterings occur according to an exponential distribution, which is memoryless. Even if there is a different mean governing each, does this bias the expected observed proportion? I don't think it does to be able to catch up to them.[/b]
This is definitely a source of bias that I overlooked.[/b]
Edit: My hypothesis is that littering today makes it more likely that both types litter tomorrow, but not exactly in the same proportion.
I believe that bivariate Weibull would be more adequate under my hypothesis.
Originally posted by PalynkaIdentify one member of RHP who was initially a non-general litterer and then became a general litterer due to having seen cigarette butts littered.
The idea that there's a cultural habit associated with some types of littering contradicts the hypothesis of memorylessness.
Edit: My hypothesis is that littering today makes it more likely that both types litter tomorrow, but not exactly in the same proportion.
I believe that bivariate Weibull would be more adequate under my hypothesis.
In order for the Weibull distribution to serve as a superior estimate of the process than the exponential distribution for the purpose of evaluating the experimental design, you'd have to claim that the littering rates increase significantly during the duration of the experiment. If they don't, the Weibull collapses into an exponential distribution.
I think the experiment can be conducted to an acceptable degree of confidence in an afteroon. Are you suggesting that the rate compounds multiple times in an afternoon? If that were the case, we'd expect to be already buried under litter. Since we're not, I maintain that the exponential distribution is a suitable estimate, and we thus need not be concerned about missed observations creating a bias.
Originally posted by DoctorScribblesThe observation and the process are two different things. If you approximate a weibull by an exponential, your errors may be small for the sample, but may be completely off mark after some time. In short, you would be biaising your model.
Identify one member of RHP who was initially a non-general litterer and then became a general litterer due to having seen cigarette butts littered.
In order for the Weibull distribution to serve as a superior estimate of the process than the exponential distribution for the purpose of evaluating the experimental design, you'd have to claim that th table estimate, and we thus need not be concerned about missed observations creating a bias.
All estimators traverse the sample optimally...for a certain parametric form. The problem is that if you choose a parametric form different from the data generating process then the problem is that your parameters would therefore be baised.
For samples of the size of an afternoon the weibull could be approximated by an exponential, but all your conclusions would then be completely wrong if you tried to extrapolate the conclusions for a more significant period of time.
Edit: Choosing a Weibull model does need the rate to multiply several times, it also is valid for smaller differences. In the end, you could always test the significancy of the extra parameter and fall back into an exponential if the extra parameter was statistically non-significant.
Originally posted by PalynkaThis objection is entirely different from the one RC raised. His was an objection to the implementation of the proposed experiment.
For samples of the size of an afternoon the weibull could be approximated by an exponential, but all your conclusions would then be completely wrong if you tried to extrapolate the conclusions for a more significant period of time.
Your objection is that the proposed experiment is addressing the wrong question altogether.
The proposed experiment addresses the static question of whether at any given period, such as today, smokers are more likely to litter than non-smokers. For an experiment that addresses this question, your objection is irrelevant, as neither the null hypothesis nor its denial speak to a trend of the littering rate (i.e., how fast each group litters) but only to the current proportions of litterers who are smokers. The experimental outcome would only be used to extrapolate future trends in the littering rates (i.e., how quickly people litter) by someone who didn't understand the experiment.
You seem to be proposing an entirely different experiment which attempts to answer the question of whether the frequency of littering (i.e., how fast the littering occurs) of non-smokers increases with time. My experiment does not attempt to answer this question. My experiment is designed strictly to test kingdanwa's hypothesis that for any given period, a smoker is more likely to be a general litterer than a non-smoker. Thus, if the littering rate does not change during the course of my experiment, the Weibull model is not superior to the simpler exponential model, especially since we have no a priori evidence to suggest which non-1 value the Weibull parameter should take. Even if the rates did change during the course of the experiment, the biasing effects would likely be so negligible as to be not worth correcting with such an additionally complex model of the inter-littering times, something that isn't even the focus of the study.
If tomorrow the inverse of the inter-littering intervals double for each group, or triple for one and quadruple for the other, I don't care, for that does not affect my conclusions nor the validity of my experimental design. I could repeat my experiment tomorrow under those new condidtions and expect to get the same conclusion I get today. The only difference is that I'd expect to get more observations tomorrow afternoon than today.
(Be sure that you are not confusing the stochastic rates under discussion - those of the inter-littering times characterized by an exponential or Weibull distribution - with what the experiment is measuring - the proportion of litterers who are smokers. You nearly confused me on this point. The only reason that distributions of inter-littering times should enter the discussion is if you have an objection similar to RC's - that it is important to observe all instances of littering in the experimental interval in order to get an unbiased estimate of the proportion of litterers who are smokers. Otherwise they are completely irrelevant to what is being studied.)
Originally posted by DoctorScribblesI agree it is different, I never said it was the same as RC's. In fact, I stated it was regarding my point of view, which I'd posted before.
This objection is entirely different from the one RC raised. His was an objection to the implementation of the proposed experiment.
Your objection is that the proposed experiment is addressing the wrong question altogether.
The proposed experiment addresses the static question of whether at any given period, such as today, smokers are more l ...[text shortened]... rs who are smokers. Otherwise they are completely irrelevant to what is being studied.)
Also, there's no point in setting values for parameters, you can estimate them by your sample. Why would I have to fix a value for any of them before? That makes no sense. If the effects of the extra parameter are negligible, it's estimation would be non-significant. You test everything at a time.
As for your last point, I was under the impression that what we tested here was the propension of smokers to litter in comparision with the propension of non-smokers to litter.
For measuring this by the proportion of litterers who are smokers, then you must also estimate the proportion of smokers in the general population, but I see your point.
I was proposing to estimate the rates, but you just wanted to used the empirical averages (i don't like to call empirical averages 'estimates', it's giving them too much credibility 😉 )
In that sense, I agree that I was shooting the rabbit with a heavy machine gun, but nevertheless there are better ways to estimate this than by OLS (which for the specific case would give merely the difference of empirical averages for the binary variable, smoker/non-smoker).
Originally posted by PalynkaWe were at the stage of evaluating the experimental design, which should be done before conducting the experiment. You can only use the information available. At the current stage, without having collected any experimental data, RC effectively asked whether the inter-littering times were memoryless, in order to ascertain whether the design was inherently biased. I suggested that littering frequencies likely occur in a similar manner to other processes known to be governed by exponential, and thus memoryless, distributions.
Also, there's no point in setting values for parameters, you can estimate them by your sample. Why would I have to fix a value for any of them before? That makes no sense.
You are the one who claimed that they were Weibull rather than exponential. I'm not attempting to fix its parameter value to anything other than 1 (which is just saying that the process is exponential), justified by defering to other similar processes known to be approximated by that distribution. I'm standing on the shoulders of giants and mounds of empirical evidence to make that estimation; I'm not basing it on nothing. This approach to analyzing the experimental design before it is conducted makes perfect sense to me.
I'm not really sure why you introduced the Weibull in the first place if your objection isn't the same as RC's. The experiment is clearly not concerned with inter-littering times, so while it might be interesting to fit the observed times to a Weibull distribution, that would be completely tangential to what is being studied.
Originally posted by DoctorScribblesI'm standing on the shoulders of giants and mounds of empirical evidence to make that estimation; I'm not basing it on nothing.
We were at the stage of evaluating the experimental design, which should be done before conducting the experiment. You can only use the information available. At the current stage, without having collected and data, RC effectively asked whether the inter-littering times were memoryless. I suggested that littering frequencies likely occur in a simi ...[text shortened]... s to a Weibull distribution, that would be completely tangential to what is being studied.
Is this supposed to be funny? You're standing on an hypothesis of memorylessness. All your data will take you the wrong way if you're wrong.
RC effectively asked whether the inter-littering times were memoryless. I suggested that littering frequencies likely occur in a similar manner as other process known to be governed by exponential, and thus memoryless, distributions.
Likely? Shoulders of giants, indeed.
The Weibull is a way of testing your hypothesis of constant rates over time. If it folds back into an exponential, voila, hypothesis not rejected by the data.
I thought you'd welcome a contribution, but I guess you just wanted to play with RC. Sorry for the interruption, then. Carry on.
Originally posted by Palynka[/b]We are losing sight of the forest. I would prefer to not test mulitple hypotheses with one experiment. One hypothesis is kingdanwa's, which my experiment is designed to test. Another is yours, that inter-littering times are Weibull and not exponential. A third is mine, that inter-littering times are exponential. I agree that performing an experiment to test your hypothesis first would be beneficial to a more accurate analysis of the design of my experiment. But I disagree that the distribution of the inter-littering times is at the heart of the matter being discussed in this thread, and I claim that it would have only the most marginal effect on interpreting the results of my proposed experiment.
[b]I'm standing on the shoulders of giants and mounds of empirical evidence to make that estimation; I'm not basing it on nothing.
Is this supposed to be funny? You're standing on an hypothesis of memorylessness. All your data will take you the wrong way if you're wrong.
RC effectively asked whether the inter-littering times were memoryless. I s n, but I guess you just wanted to play with RC. Sorry for the interruption, then. Carry on.
Even if I knew that they were decidedly non-exponential, I'd still carry out the experiment as originally described, as it would still give me a satisfactory though marginally biased answer to the main question of interest in this thread: whether smokers are more likely to litter than non-smokers. That's why I'm dismissing your hypothesis, as well as mine that the times are exponential, as being unworthy of experiment. The only reason I even hypothesized the exponential distribution was in refutation to RC's objection. In retrospect, a better refutation would have been, "So what? I don't expect that to significantly bias the observed proportion" without hypothesizing any distribution in support.
I'm happy to withdraw my hypothesis of the exponential distribution and replace it with a claim of ignorance of and indifference to the distribution if anybody so wishes. That way nobody's tangential hypothesis gets any special consideration or endorsement and we can focus on the main one.
I'm also happy to keep bickering over it, if you prefer. I'm just a happy guy all around.
Originally posted by DoctorScribblesI accept the withdrawal of your hypothesis. 😉
I'm happy to withdraw my hypothesis of the exponential distribution
It was the most interesting part, though. At least for me.
Edit: And just why not make a non-parametric estimation of the whole thing?
Oops, what have I said? This is beyond shooting a rabbit with a heavy machine gun, it's like bombing a country for establishing a stable democracy. Hum.
Originally posted by PalynkaOk, I withdraw my hypothesis that the inter-littering times are exponential.
I accept the withdrawal of your hypothesis. 😉
RC, both of your observations about my experimental design are important to consider when interpreting the results of the experiment, as both may yield a bias in the observed proportion of litterers who are smokers. However, I don't expect that either will cripple the experiment to such a degree that we can't be confident in its results. If we find ourselves at a borderline decision about whether to accept or reject the null hypothesis, I propose we conduct a secondary experiment to determine whether the inter-littering times are memoryless. I'm willing to take that they are not as the null hypothesis. If that secondary experiment leaves us accepting that null hypothesis, I propose that you then analyze the degree to which the results of the first experiment are likely to be biased in light of the non-memoryless distribution times coupled with the non-instantaneous observe-sniff cycle.