Exercise 6.1 (Q4) [OLD]
  [TA] Di Su    Created at: 0000-00-00 00:00   2022A6Ex1 1 
"Exercise 1 ( Suicide risk index (100%)). According to the news reported in South Chain Morning Post, Hong Kong suicide risk index hit ""risis level"" in Covid-19 fifth wave. The crisis, high and medium levels of the suicide risk index are defined as the average daily number of suicide cases in the past 365 days plus 1, 2, and 3 standard deviations, respectively; see the Centre for Suicide Research and Prevention. The current crisis, high and medium levels are 3.56, 3.20 and 2.84, respectively. In this exercise, we analyze the number of suicide death from 1997 to 2016 in the dataset suicideHK.csv. Write \(x\sim\text{NB}[\mu, \sigma^2]\) if \(x\) follows a negative binomial distribution with mean \(\mu=(1-p)r/p\) and variance \(\sigma^2=\mu/p\). In this case, \(x\) is the number of failures until \(r\) successes in a sequence of Bernoulli trials with success probability \(p\). In R, we may generate it by rnbinom(1, size=r, prob=p). The PMF is \[f(x) = {{x+r-1}\choose{x}} p^r (1-p)^x \mathbb{1}(x=0,1,2,\ldots).\] Assume that \[\begin{aligned}x &\sim \text{NB}[\theta, 10\theta] , \\ f(\theta) &\propto \left\{ \begin{array}{ll} e^{-(L-\theta)/\theta_0} & \text{if $0<\theta< L$};\\ 1 & \text{if $L\leq \theta \leq U$};\\ e^{-(\theta-U)/\theta_0} & \text{if $\theta> U$}, \end{array}\right.\end{aligned}\] where \(\theta_0 = 5\), \(L=940\) and \(U=970\) are selected according to prior belief. The goals of this exercise are: (A) inference on the daily mean suicide number \(\phi = \theta/365\); and (B) redefinition of the three levels.
  1. See Q1
  2. See Q2
  3. See Q3
  4. Goal B: Redefinition of the three levels. Assume that \([x_1, \ldots, x_n, x_{n+1}\mid \theta]\) are iid.
    1. (10%) Which two methods in part 2 are directly applicable for computing the posterior predictive distribution \([x_{n+1}/365\mid x_{1:n}]\)?
    2. (10%) Compute the confidence estimators \(\widehat{\alpha}\) of the intervals \([0, 3.56]\), \([0, 3.20]\) and \([0, 2.84]\).
    3. (10%) Compute a 99% right tail credible interval of \(x_{n+1}/365\).
    4. (10%) Propose a way to redefine the crisis, high and medium levels with statistical interpretation. Comment with \(\lesssim 50\) words.
Hints: See Remark 2.1. Don't read the hints unless you have no ideas and have tried for more than 16 mins.
Remark 1 (Hints for Exercise 2.).
  1. There are several computational tricks you need to learn in this exercise:
    • Computing posterior density when it is not a function of low dimensional sufficient statistics.
    • Rescaling the posterior density such that the rescaling factor that does not depend on the input \(\theta\).
    • Using logarithm of some commonly used mathematics functions, e.g., \(\log{A\choose B}\) can be computed by lchoose(A,B) in R. Other examples include lgamma and lfactorial.
    • Using rowSums and colSums instead of a for-loop to speed up computation.
    • Distinguishing the difference between ifelse and `if` in R.
    • Writing a recursive R function, i.e., a function that calls itself.
    The (unnormalized) posterior density can be computed as follows. Please try writing the function yourself before reading the hints below.
    
    

    posterior = function(theta, x, log=FALSE, rescaling=TRUE) #Preliminary definition theta0 = 5 L = ... # <<< Complete this line
    U = ... # <<< Complete this line
    p = ... # <<<Complete this line
    r = theta/9
    # unnormalized density n = length(x) S = sum(x) A = rowSums(outer(r,x,function(r,x)lchoose(x+r-1,x))) logd0 = ...# <<< Complete this line # rescaling the unnormalized density # (Note: It is more general than the trick used in A3. Why?) if(rescaling) t = seq(from=900, to=1500, length=101) C = max(posterior(t,x,log=TRUE,rescaling=FALSE)) logd0 = logd0-C `if`(log, logd0, exp(logd0))

    1. Similar to Example 7.2.
    2. Similar to Example 7.4 but you may need to use a proposal other than normal distribtiio. You may wish to use the following fact: Let \(\theta \sim \text{Ga}(\alpha)/\beta\). If \(\mu = \mathsf{E}(\theta)\) and \(\sigma^2 = {\mathsf{Var}}(\theta)\), then \[\alpha = \frac{\mu^2}{\sigma^2} \qquad \text{and} \qquad \beta = \frac{\mu}{\sigma^2}.\] So, you can generate Gamma RV s easily by specifying \(\mu\) and \(\sigma\) instead of \(\alpha\) and \(\beta\) as follows:
      
      

      mu = 3 sigma = 5 theta = rgamma(100000, mu^2/sigma^2, mu/sigma^2) c(mean(theta),sd(theta)) [1] 2.998766 4.980204

    3. Similar to Example 7.6.
    4. Similar to Example 7.8 but you need to do a similar modification as in part 2(b).
  2. Similar to the experiment in Example 7.8.
    1. Which methods generate posterior sample of \(\theta\)?
    2. You may use the following steps:
      • Step 1: generate posterior sample of \(\theta\). (It is done in part 2).
      • Step 2: for each posterior draw of \(\theta\), generate \(x_{n+1}\) according to the \(\text{NB}\) model.
      • Step 3: use the posterior draws of \(x_{n+1}\) to complete the task.
    3. Open-ended.
"
4.1
Easy Difficult    Number of votes: 10
the suicide risk index
 Anonymous Orangutan    Created at: 2022-04-21 14:13  6 
the suicide risk index are defined as the average daily number of suicide cases in the past 365 days plus 1, 2, and 3 standard deviations, respectively.

I would like to know that for what value standard deviation is computed. (x, x/365, $\theta$ or 10* $\theta$) ?
In this question, we use yearly data, not daily data, so i am confused.

Thanks for your help.
Show 3 reply
  [TA] Di Su    Created at: 2022-04-21 17:02  2 
It should be the standard deviation of $x/365$.
 Anonymous Hedgehog    Created at: 2022-04-24 20:30  3 
I have a question.
“The crisis, high and medium levels of the suicide risk index are defined as the average daily number of suicide cases in the past 365 days plus 1, 2, and 3 standard deviations”
Should it be crisis = mean + 3sd , high = mean + 2sd, medium = mean + 1sd instead ?
  [TA] Di Su    Created at: 2022-04-25 16:45  2 
I think you are correct. The crisis level should have the largest value.
Alternative method for Q4(b)(c) ?
 Anonymous Hedgehog    Created at: 2022-04-24 18:21  3 
The hints said that we can use posterior sample of $\theta$ to generate $x_{n+1}$ .

However, $f(x_{n+1}|x_{1:n})
= \int_0^\infty \pi (\theta |x_{1:n})f(x_{n+1} |\theta )d\theta
= \frac{\int_0^\infty \pi_u (\theta |x_{1:n})f(x_{n+1} |\theta )d\theta }{\int_0^\infty \pi_u (\theta |x_{1:n} )d\theta }$
where for each fixed $x_{n+1}$ , $f(x_{n+1}|\theta )$ is just a function of $\theta$

Can I instead generate $x_{n+1}=0,1,…,2000$ and for each $x_{n+1}$ , evaluate the above integrals and find $f(x_{n+1}|x_{1:n}) $ directly?

Then for (b) I can easily find $\hat\alpha$ by calculate $P(x_{n+1}∈ I|x_{1:n}) $
Show 1 reply
  [TA] Di Su    Created at: 2022-04-25 17:10  1 
Yes, you can. Please show the details in your answer, and make sure the densities, integrals, and the confidence level are approximated in a correct way.
In practice, you can compare the time of sampling from the sampling distribution with that of calculating the integrals, and decide which is preferable.
I would like to ask what do alphahat mean in Q4b?
 Anonymous Dolphin    Last Modified: 2022-04-26 22:26  7 
I am sorry I have to ask sth reli stupid. But i would like to ask if alphahat represent the confidence level of 3 intervals or th probability fall within them?
Thanks.
Show 1 reply
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-04-27 01:12  8 
The confidence estimator $\widehat{\alpha}$ means the posterior probability that that $x_{n+1}/365$ falls in the desired interval, i.e., $$\widehat{\alpha} = \mathsf{P}\left( \frac{x_{n+1}}{365} \in \text{an interval}\mid x_{1:n}\right).$$
P.s.: I also notice the confusion point. I will revise the notation next year. Anyway, I will make the definition clear if it appears in the final project.
Right tail CI
 Anonymous Orangutan    Created at: 2022-04-26 17:52  0 
I am not sure what is tight tail credible interval.
is this equal to below code?
quantile(target , c(0,0.99))
Show 1 reply
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-04-27 01:52  6 
  • A right-tailed interval means that the right tail is of interest. So, we wish to find
    \[\text{Right-tailed interval} = ( \text{minimum}, C )\]
    for some $C$. In our case, the minimum possible value of $x_{n+1}/365$ is zero. So the right-tailed interval is of the form $[0,C]$.
  • As a remark, let me take $z$-test as an example.
    • (Hypothesis test) If we suspect $\mu$ is larger than some value, then the right tail is of interest. So, we should use a right-tailed test to test $H_0: \mu \leq \mu_0$ against $H_1: \mu > \mu_0$.
    • (Confidence interval) If we wish to understand the probable upper value of $\mu$, then we should use a right-tailed confidence interval of the form $(\text{minimum}, C)$.
Inquiry on Q4c
 Anonymous Dolphin    Created at: 2022-04-26 22:30  3 
In order to finish Q4c, my idea is to first apply the thetas we generated in either MH algorithms or REIS method into NB(theta,10theta) which p and r should equal to 0.1 and 0.9*theta. Is it correct, please?
Thank you so much!

Show 5 reply
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-04-27 02:05  11 
It is correct. A general procedure for doing Bayesian prediction is as follows:
  • Step 0: build a model, e.g., $[x_1, \ldots, x_n \mid \theta] \sim \text{Po}(\theta)$.
  • Step 1: generate posterior sample $[\theta \mid x_{1:n}]$ by, e.g., MH algorithm. Denote the MH sample by $\theta_1, \ldots, \theta_J$.
  • Step 2: For each $j=J/2+1, \ldots, J$, generate a prediction of $x_{n+1}$ as $x_{n+1, j}\sim \text{Po}(\theta_j)$.
  • Step 3: Infer $x_{n+1}$ by
    • (a) Point prediction: One may approximate $\mathsf{E}(x_{n+1}\mid x_{1:n})$ by $$\overline{x}_{n+1} = \frac{1}{J/2}\sum_{j=J/2+1}^J x_{n+1, j}.$$
    • (b) Interval prediction: One may find an approximate $95\%$ credible interval of $x_{n+1}$ by $$\left[\widehat{Q}_{2.5\%}, \widehat{Q}_{97.5\%}\right],$$ where $\widehat{Q}_p$ is the $100 p\%$ sample quantile of $\{x_{n+1, j}\}_{j=J/2+1}^J$.
Remark: If we are asked to do inference (or infer something), then we may do, e.g., point estimation, interval estimation and testing.
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-04-27 10:10  6 
More generally, you may also think about the following more general Bayesian prediction problem in time series: Let $[x_i \mid x_{i-1}, \beta] \sim \text{Normal}(\beta x_{i-1}, 1)$ for $i=1, \ldots, n$, where $x_0$ is fixed.
Q: How to predict $x_{n+1}$ and $x_{n+2}$ given $x_{1:n}$?
A: Do as follows:
  • Step 1: generate posterior sample $\beta_1, \ldots, \beta_J$ from $[\beta\mid x_{1:n}]$.
  • Step 2: For each $j=J/2+1, \ldots, J$, generate $x_{n+1,j}$ and $x_{n+2,j}$ by $$x_{n+1, \color{red}{j}} \sim \text{Normal}(\beta_{\color{red}{j}} x_{n}, 1)\qquad \text{and}\qquad x_{n+2, \color{red}{j}} \sim \text{Normal}(\beta_{\color{red}{j}} x_{n+1,\color{red}{j}}, 1).$$
    Note that $x_{n+1,\color{red}{j}}$ and $x_{n+2,\color{red}{j}}$ are generate sequentially under the same $\beta_{\color{red}{j}}$.
  • Step 3: …
More examples of Bayesian time series modelling can be found in the following examples.
  • Example 2.28 discusses a standard AR(1) model.
  • Example 3.28, Example 3.37, Example 7.9 are also related
 Anonymous Dolphin    Created at: 2022-04-27 14:10  0 
Thank you so much!!!
 Anonymous Loris    Created at: 2022-04-28 21:13  1 
For a given $\theta_j$, why don't we generate many $x_{n+1}$ samples instead of only one?
Even we fix the parameter, there is still randomness in GDP. So I think simulating more $x_{n+1}$ for each $\theta$ is more reasonable.
Thanks!
  [TA] Di Su    Last Modified: 2022-05-11 22:13  0 
It is because we are working with the posterior predictive, which is not conditioned on $\theta$, instead of the sampling distribution. And we need to include the randomness in $\theta$ that comes from the posterior into account. You may roughly understand it using representation: given $x_{1:n}$, $$x_{n+1}=\beta x_{n}+z,$$ for some $z\sim N(0,1),$ and the $\beta$ here follows the posterior distribution. In simulation, $\beta_1,\dots,\beta_J$ are samples from the posterior, hence we simulate the posterior predictive as $$x_{n+1}=\beta_j x_{n}+z,j=1,\dots,J.$$
Redefinition of three levels
 Anonymous Loris    Last Modified: 2022-04-29 16:02  4 
I am confused about how to use the posterior predictive to define new crisis levels
  1. Since our $\theta$s generated in the MH algorithm is not independent of each other, there is also a serial correlation among our posterior predictive samples. Due to the dependency, the standard deviation given by those samples will be inflated. Therefore, how can we estimate sd of $x_{n+1}$ accurately?
  2. Can I use some quantiles of the posterior predictive distribution to define crisis levels, and then give them corresponding interpretations?
Many thanks!
Show 2 reply
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-04-28 11:32  8 
It is a good question.

MCMC generates dependent sample $\theta_1, \ldots, \theta_J$ (say). Asymptotically (as $J\rightarrow\infty$), we know that $\theta_{J/2+1}, \ldots, \theta_J \approx f(\theta \mid x_{1:n})$, which implies that
\[
\mathsf{E}_{\#}(\theta_j) \approx \mathsf{E}(\theta \mid x_{1:n}), \qquad j=J/2+1, \ldots, J,
\]
where
  • the expectation $\mathsf{E}_{\#}(\cdot)$ measures the randomness from the MCMC algorithm (because MCMC is a stochastic algorithm), whereas
  • the expectation $\mathsf{E}(\cdot\mid x_{1:n})$ measures the randomness from the posterior $[\theta \mid x_{1:n}]$.
By the method of moment, we can construct an estimator of $\mathsf{E}(\theta \mid x_{1:n})$ as follows:
\[
\widehat{\mu} = \frac{1}{J/2} \sum_{j=J/2+1}^J \theta_j.
\]
It is a reasonable estimator according to Theorem 7.12 (Ergodic theorem), which states the $\mathsf{P}_{\#}$-almost sure convergence of the estimator $\widehat{\mu}$. Alternatively, you may also argue the
  • (Bias) Taking expectation $\mathsf{E}_{\#}(\cdot)$ on both sides, it is not hard to see that
    \begin{align}
    \mathsf{E}_{\#}(\widehat{\mu})
    &= \frac{1}{J/2} \sum_{j=J/2+1}^J \mathsf{E}_{\#}(\theta_j) \\
    &\approx \frac{1}{J/2} \sum_{j=J/2+1}^J \mathsf{E}(\theta \mid x_{1:n}) \\
    &= \mathsf{E}(\theta \mid x_{1:n}).
    \end{align}
  • (Variance) The variance $\mathsf{Var}_{\#}(\cdot)$ can be found by using Theorem 7.13 (together with some extra regularity condition that ensures the existence of the variance):
    \begin{align}
    \mathsf{Var}_{\#}(\widehat{\mu})
    &\approx \frac{1}{J/2} \sigma^2_{\text{MCMC}}\\
    &\rightarrow 0 \qquad \text{as $J\rightarrow \infty$}.
    \end{align}
Combining the bias and variance, we know that $\widehat{\mu}$ is $\mathcal{L}^2(\mathsf{P}_{\#})$ consistent for $\mathsf{E}(\theta\mid x_{1:n})$, where $\mathsf{P}_{\#}$ is the probability measure driven by the MCMC algorithm, and $\mathcal{L}^2(\mathsf{P}_{\#})$ consistency mean that the mean squared error under the law $\mathsf{P}_{\#}$ goes to zero. All the results above can be applied to estimation of $\mathsf{E}(\theta^2\mid x_{1:n})$. Clearly, a method of moment estimator is $\widehat{\mu}_2 \sum_{j=J/2+1}^J \theta_j^2$. Combining the results, we have
\[
\widehat{\sigma}^2 := \widehat{\mu}_2 - \widehat{\mu}^2 = \left( \frac{1}{J/2}\sum_{j=J/2-1}^J \theta_j^2\right) - \left( \frac{1}{J/2}\sum_{j=J/2-1}^J \theta_j\right)^2 \approx \mathsf{Var}(\theta\mid x_{1:n}).
\]
For the quantile estimator, you may argue similarly as above. So, we have
\[
\text{Sample $100p\%$ quantile of $\{\theta_{J/2+1}, \ldots, \theta_J\}$} \approx Q_{p}(\theta \mid x_{1:n}).
\]
Final remarks:
  • As long as $\theta_j$ follows the target distribution marginally, we can construct the method of moment (MoM) estimators as in the iid case. The serial dependence does not affect the construction of the MoM estimator.
  • The serial dependence does affect the properties (e.g., precision) of the MoM estimator as $\theta_j$ and $\theta_{j+1}$ can be highly dependent on each other. In this case, $\theta_j$ and $\theta_{j+1}$ can be effectively the same observations, which makes one of them practically useless in improving the precision of the MoM estimator.
  • Note also that there are two probability measures mentioned in the above explanation: $\mathsf{P}_{\#}(\cdot)$ and $\mathsf{P}(\cdot \mid x_{1:n})$. They mean different things as the sources of randomness are different. The simplest way to see it is that
    \[
    \mathsf{Var}_{\#}(\theta) \neq \mathsf{Var}(\theta \mid x_{1:n}).
    \]
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-04-28 11:52  6 
You can also use the following time series example to understand the problem. For $j=1,2,\ldots$, let
\[
\theta_j = a \theta_{j-1}+ \varepsilon_j, \qquad \varepsilon_j \sim \text{Normal}(0,1),
\]
where $a\in(-1,1)$. Then $\theta_1, \ldots, \theta_J$ are serially dependent. But each $\theta_j\sim \text{Normal}(0, 1/(1-a^2))$. You can use the sample mean, the sample standard deviation, and sample quantile to estimate the corresponding population quantities. You may try the following experiment in R.
set.seed(1)
a = 0.5
# theta[j] ~ N(0, 1/(1-a^2))
theta = arima.sim(1000, model=list(ar=a))
estimator = c(mean(theta), sd(theta), quantile(theta,0.25))
truth     = c(0, 1/sqrt(1-a^2), qnorm(0.25,0,1/sqrt(1-a^2)))
result    = cbind(estimator, truth)
rownames(result) = c("mean","sd","quantile")
result 
Here is the result:
> result
           estimator      truth
mean     -0.02834617  0.0000000
sd        1.15648768  1.1547005
quantile -0.77606159 -0.7788337

Q4(b)(c): samples generated by which method should be used?
 Anonymous Ifrit    Last Modified: 2022-04-28 00:45  2 
When we work on (b)-(c), should we not use the samples generated by reweighting IS, as we see in Example 7.6 that the variance of the samples generated by reweighting IS is higher?
Show 1 reply
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-04-28 11:35  5 
The estimator based on reweighing importance sampling is still consistent and correct although it has a larger variance than the corresponding importance sampling estimator. In this example, the goal is to help warm you up to all methods introduced in the lecture notes.
R code for generating NB samples
 Anonymous Ifrit    Last Modified: 2022-04-28 01:07  2 
Could I just use a one-line code to generate the NB samples, instead of generating them one by one? i.e.:
theta = c(1.3,2.7,3.412,4.98,5.01) # some arbitrary values of theta, we have 5 samples of theta now

# method 1: generate samples one by one
NB.sample = rep(NA,5) # create a vector to store the output values
for (i in 1:5){
 NB.sample[i] = rnbinom(1,size = theta[i]/9,prob = 1/10) # generate 1 sample each step
}

# method 2: generate all the samples at the same time
NB.sample = rnbinom(5,size = theta/9,prob = 1/10)
For method 2, will the NB.sample generated match the order of theta? For example, does the 3rd element of NB.sample correspond to the NB sample generated using the 3rd value of theta, i.e. rnbinom(1,size = theta[3]/9,prob = 1/10)?
Show 1 reply
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-04-28 11:38  6 
Yes, you can do it either way. They are identical. You may set seed and do an experiment as follows:
> theta = c(1.3,2.7,3.412,4.98,5.01) # some arbitrary values of theta, we have 5 samples of theta now
> 
> # method 1: generate samples one by one
> set.seed(1)
> NB.sample = rep(NA,5) # create a vector to store the output values
> for (i in 1:5){
+  NB.sample[i] = rnbinom(1,size = theta[i]/9,prob = 1/10) # generate 1 sample each step
+ }
> NB.sample
[1]  0 14  3 12  0
> 
> # method 2: generate all the samples at the same time
> set.seed(1)
> NB.sample = rnbinom(5,size = theta/9,prob = 1/10)
> NB.sample
[1]  0 14  3 12  0
Part 4(d) Redefining the levels
 Anonymous Ibex    Created at: 2022-04-29 01:32  1 
May I ask that is there any problem for redefining the levels by using the following approach:
  1. Use the value (mean of theta condition on x_1:n) calculated by MH-Algorithm in part 2(d) to serve as the theta in NB[theta, 10*theta].
  2. Then, one can prove that x_n+1/365 condition on x_1:n ~ NB[theta/365, 10*theta/365^2].
  3. Redefine crisis level by calculating theta/365 + 3*sqrt(10*theta)/365 (mean + 3*S.D.).
where theta is the value found in part 2(d) by using the MH-Algorithm.
I am not sure that is it ok to directly replace theta by using the value I found in part 2(d) since this value is only the mean of theta but not necessarily the true value of theta.

Show 2 reply
  [TA] Di Su    Created at: 2022-04-29 15:48  1 
You can use the posterior samples of $\theta$ found by MH in part 2(d). Then you can redefine the levels using samples from the posterior predictive. Notice that $\theta$ is a random variable so there is no “true” value for it.
 Anonymous Ibex    Created at: 2022-04-29 16:36  0 
I get it now!!! Thanks for reminding me that theta is a random variable!!!
(d) Redefining
 Anonymous Orangutan    Created at: 2022-04-29 07:31  1 
originally, index is defined as the average daily number of suicide cases in the past 365 days plus 1, 2, and 3 standard deviations.
In this question, do we need to follow this definition and just change the value based on the calculated mean and sd?
Or Can I use my propose idea (definition)? for instance, p($\theta$/365 $\in$crisis) = 0.99

Show 1 reply
  [TA] Di Su    Last Modified: 2022-04-29 15:51  0 
You definitely can use your own idea as long as you briefly explain why you think it makes sense.
Q4(d)
 Anonymous Liger    Created at: 2022-04-29 16:42  0 
By Q4 (d), for the statistical interpretation, can we explain the three levels obtained from the aspect of quantiles? Also do we need to give specific numbers after redefining the levels?
Show 1 reply
  [TA] Di Su    Created at: 2022-04-29 20:14  0 
Yes, you can explain it in the sense of quantiles. And I think you can immediately obtain the specific values of the new levels.