About Chapter 5
  [Developer] Kai Pan (Ben) Chu    Created at: 0000-00-00 00:00   Chp5General 2 
Question / Discussion that is related to Chapter 5 should be placed here
5
Easy Difficult    Number of votes: 1
HPD: normalization step
 Anonymous Orangutan    Last Modified: 2022-03-21 13:26  1 
I' m still not sure the step of normalization.
①For me, it is weird the sum(d0) and the interval are separated.
Is it possible to rewrite like below?
⓶Also, I computed d and I think d is not normalized.
Although it is ok to multiply by (theta[2]-theta[1]) again when we compute N,
I don't know how to interpret the y axis of posterior distribution(from 0 to 5 in this example).
Why we don't use d = d0/sum(d0) , when we plot (theta, d) ?
HPD = function(x, a=0, b=1, alpha=0.005, posterior){
  # step 1: values of posterior at different values of theta in [a,b]
  theta = seq(from=a, to=b, length=301)
  d0 = posterior(theta, x)
  d = d0/sum(d0)/(theta[2]-theta[1])
  #equivalent??
  d = d0/sum(d0*(theta[2]-theta[1])) #Σ{d(θi)×(θ[2]-θ[1])}
  > sum(d)
[1] 300
>   sum(d*theta)
[1] 143.9699
・
・
・
N = sum(cumsum(d[O])*(theta[2]-theta[1])<1-alpha)+1
plot(theta,d, type="l", lwd=2, col="red4", 
       xlab=expression(theta),
       ylab=expression(pi(theta~"|"~italic(x[1:n]))))
Show 1 reply
  [TA] Di Su    Last Modified: 2022-03-23 20:42  2 
① It is ok to write like below.
⓶ For a continuous random variable X, it is possible that $f_X(x)>1$, what we want to restrict is $\int_\mathcal{X}f_X(x)\mathrm{d}x\leq1$.
We don't use d = d0/sum(d0) when we plot (theta, d) because we want to use Riemman sum, so the term (theta[2]-theta[1]) is needed.
credible interval
 Anonymous Mink    Created at: 2024-03-29 17:56  0 
May I ask is there any frequentist's language for the credible interval like confidence interval to reject H0 at confidence... sth like that?
Show reply
Exercise 4.2
  [TA] Chak Ming (Martin), Lee    Last Modified: 2024-03-23 10:41   A4Ex4.2 4 
"
Related last year's exercise and discussion can be found here.
Exercise 2 (Horse racing (40%)). Hong Kong Jocemph Club (HKJC) organizes approximately 700 horse races every year. This exercise analyses the effect of draw on winning probability. According to HKJC:
The draw refers to a horse’s position in the starting gate. Generally speaking, the smaller the draw number, the closer the runner is to the insider rail, hence a shorter distance to be covered at the turns and has a slight advantage over horses with bigger draw numbers.
The dataset horseRacing.txt, which is a modified version of the dataset in the GitHub project HK-Horse-Racing, can be downloaded from the course website. It contains all races from 15 Sep 2008 to 14 July 2010. There are six columns:
race (integral): race index (from 1 to 1364).
distance (numeric): race distance per meter (1000, 1200, 1400, 1600, 1650, 1800, 2000, 2200, 2400).
racecourse (character): racecourse (""ST"" for Shatin, ""HV"" for Happy Valley).
runway (character): type of runway (""AW"" for all weather track, ""TF"" for turf).
draw (integral): draw number (from 1 to 14).
position (integral): finishing position (from 1 to 14), i.e., position=1 denotes the first one who completed the race.
The first few lines of the dataset are shown below. Rcodetag2 In this example, we consider all races that (i) took placed in the turf runway of the Shatin racecourse, (ii) were of distance 1000m, 1200m, 1400m, 1600m, 1800m and 2000m; and (iii) used draws 1–14. Let
\(n\) be the total number of races satisfying the above conditions;
\(\texttt{position}_{ij}\) be the position of the horse that used the \(j\)th draw in the \(i\)th race for each \(i,j\).
For each \(i=1, \ldots, n\), denote \[x_i = \mathbb{1}\bigg( \frac{1}{|\texttt{draw}_i\cap[1,7]|}\sum_{j\in\texttt{draw}_i\cap[1,7]} \texttt{position}_{ij} < \frac{1}{|\texttt{draw}_i\cap[8,14]|}\sum_{j\in\texttt{draw}_i\cap[8,14]} \texttt{position}_{ij} \bigg),\] Denote the entire dataset by \(D\) for simplicity. Suppose that \[\begin{aligned} \left[x_i \mid \theta_{\texttt{distance}_i} \right] & \overset{ {\perp\!\!\!\!\perp } }{\sim} & \text{Bern}(\theta_{\texttt{distance}_i}), \qquad i=1,\ldots,n\label{eqt:raceModel1}\\ \theta_{1000},\theta_{1200},\theta_{1400},\theta_{1600},\theta_{1800},\theta_{2000} & \overset{ \text{iid}}{\sim} & \pi(\theta),\label{eqt:raceModel2}\end{aligned}\] where \(\pi(\theta) \propto \theta^2(1-\theta^2)\mathbb{1}(0<\theta<1)\) is the prior density.
(10%) What are the meanings of the \(x\)’s and \(\theta\)’s?
(10%) Test \(H_0: \theta_{1000}\leq 0.5\) against \(H_0: \theta_{1000}> 0.5\).
(10%) Compute 95% credible interval for each \(\theta_{1000},\theta_{1200},\ldots,\theta_{2000}\). Plot them on the same graph.
(10%) Interpret your results in part (3). Use no more than about 100 words.
"
4.6
Easy Difficult    Number of votes: 10
small typo "H1" in Ex4.2
 Anonymous Mink    Created at: 2024-03-29 12:06  1 
Show reply
Example 3.2
  [TA] Chak Ming (Martin), Lee    Created at: 0000-00-00 00:00   Chp3Eg2 0 
Example 3.2 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Binomial model). Let \(x\sim \text{Bin}(n,\theta)\). Consider the Jeffreys prior and the flat prior: \[f_1(\theta) \propto \theta^{-1/2}(1-\theta)^{-1/2} \qquad\text{and}\qquad f_2(\theta) \propto 1,\] for \(\theta\in(0,1)\). The corresponding posteriors are \([\theta\mid x] \sim \text{Beta}(x+1/2, n-x+1/2)\) and \([\theta\mid x] \sim \text{Beta}(x+1, n-x+1)\), respectively. The corresponding MAP estimators are then given by (Exercise) \[\widehat{\theta}_{MAP(1)} = \left\{ \begin{array}{ll} \left[ \frac{x-1/2}{n-1} \right]_0^1 & \text{if $n>1$};\\ x/n & \text{if $n=1$}. \end{array}\right. \qquad \qquad\text{and}\qquad \widehat{\theta}_{MAP(2)} = \frac{x}{n},\] where \([a]_0^1 = \min\{\max(a,0),1\}\) for \(a\in\mathbb{R}\). The estimators \(\widehat{\theta}_{MAP(1)}\) and \(\widehat{\theta}_{MAP(2)}\) are equivalent as \(n\rightarrow\infty\). \(\;\blacksquare\)
3
Easy Difficult    Number of votes: 1
$\hat{\theta}_{MAP(1)}$
 Anonymous Pumpkin    Last Modified: 2024-03-29 10:26  0 
We can get $\hat{\theta}_{MAP(2)}$ by taking the derivative of the $Beta$ density kernal with respect to $\theta$. i.e.
Let $f(\theta) = \theta^{\alpha - 1}(1-\theta)^{\beta - 1}$, where $\alpha = x+1, \beta = n - x + 1$, we have
$$
\frac{\partial f}{\partial \theta} = \theta^{\alpha - 2}(1-\theta)^{\beta-2}\left[ (\alpha-1)(1-\theta) - (\beta-1)\theta \right]
$$
And then we can get $\displaystyle\hat{\theta}_{MAP(2)}=\frac{x}{n}$.

However, how can we get the $\hat{\theta}_{MAP(1)}$ in a similar way?
Show reply
Exercise 4.1
  [TA] Chak Ming (Martin), Lee    Created at: 2024-03-18 14:48   A4Ex4.1 3 
"Related last year's exercise and discussion can be found here.
Exercise 1 (Testing and region estimation (60%)). Let \[\begin{aligned} & \overset{ \text{iid}}{\sim} \text{N}(\theta, \theta^2) , \\ \theta &\sim \theta_0\text{Exp}(1),\end{aligned}\] where \(\theta_0=0.5\). Suppose the dataset A4Q1.csv is observed. The goal is to perform inference on \(\theta\).
  1. (10%) Derive and plot the posterior density of \([\theta\mid x_{1:n}]\).
  2. (10%) Compute the maximum a posteriori (MAP) estimator of \(\theta\). Visualize it on the plot.
  3. (10%) Compute the highest posterior density (HPD) credible region (CR) \(\theta\). Visualize it on the plot.
  4. (10%) Compute the confidence estimator \(\widehat{\alpha}\) under the loss in Example 5.6 with \(I=(1.1, 1.3)\).
  5. (10%) Consider testing \[H_0: \theta \leq 1.2\qquad \text{and} \qquad H_1: \theta>1.2.\] Under the loss in Theorem 4.1 with \(a_0 =1\) and \(a_1=99\), compute the Bayes solution. What is your conclusion?
  6. (10%) Consider testing \[H_0: \theta = 1.2\qquad \text{and} \qquad H_1: \theta\neq 1.2.\] Modify the prior of \(\theta\). Then compute the Bayes factor. What is your conclusion?
"
4.6
Easy Difficult    Number of votes: 10
What does "Modify the prior" mean in (vi)?
 Anonymous Mink    Created at: 2024-03-28 22:23  0 
May I ask "Modify the prior" meaning in (vi)?
Show 2 reply
 Anonymous Mink    Created at: 2024-03-29 12:07  0 
I think I can try my own prior which is easy to calculate let me do this hah
 Anonymous Hippo    Last Modified: 2024-03-29 16:25  0 
I think this simply mean modifying the prior for simple null hypothesis problem. Please refer to chapter 4.5
About Chapter 4
  [Developer] Kai Pan (Ben) Chu    Created at: 0000-00-00 00:00   Chp4General 2 
Question / Discussion that is related to Chapter 4 should be placed here
2
Easy Difficult    Number of votes: 1
small typo
 Anonymous Mink    Created at: 2024-03-27 13:07  2 

It looks there is a typo in our in-class note?
Show reply
Exercise 2.1
  [TA] Chak Ming (Martin), Lee    Created at: 2024-02-15 21:41   A2Ex1 3 
"Related last year's exercise and discussion can be found here.
Example 1 ${\color{red}\star}{\color{black}\star}{\color{black}\star}$ Different types of priors (50$\%$) Suppose that a new virus that causes a pandemic was recently discovered. A new vaccine against the new virus was just developed. The aim of this exercise is to estimate the vaccine’s efficacy \(\theta\), which is defined as \[\begin{aligned} \theta &= \frac{\pi_0 - \pi_1}{\pi_1}, \end{aligned}\] where \(\pi_0\) and \(\pi_1\) are the attack rates of unvaccinated and vaccinated humans, respectively. In a preclinical stage, the vaccine was tested on laboratory animals (not humans). The following data were obtained.
Unvaccinated animals Vaccinated animals
Infected animals 14 3
Uninfected animals 1 17
Now, the vaccine is tested on healthy humans. Suppose there are \(n+m\) people. Among them, \(m\) people are randomly assigned to the control group (\(j=0\)), and the rest to the treatment group \((j=1)\). Denote \[x_{i}^{(j)} = \mathbb{1}(\text{the $i$th person in group $j$ is infected})\] for each \(i\) and \(j\). Let \[[ x_1^{(0)}, \ldots, x_m^{(0)} \mid \pi_0, \pi_1 ] \overset{ \text{iid}}{\sim} \text{Bern}(\pi_0) \qquad \text{and} \qquad [ x_1^{(1)}, \ldots, x_n^{(1)} \mid \pi_0, \pi_1 ] \overset{ \text{iid}}{\sim} \text{Bern}(\pi_1)\] be two independent samples given \(\pi_0, \pi_1\).
  1. (40%) Suggest, with a brief explanation (\(\lesssim\) 20 words each) or mathematical derivation,
    1. a conjugate prior on \((\pi_0, \pi_1)\),
    2. an informative prior on \((\pi_0, \pi_1)\),
    3. a non-informative prior on \((\pi_0, \pi_1)\), and
    4. a weakly-informative prior on \((\pi_0, \pi_1)\).
    Note: you may use other information not provided in the question for parts (b) and (d).
  2. (10%) Emily was a statistician who analyzed the above dataset. After observing the \(m+n\) observations in the clinical stage, she performed a rough analysis and strongly believe that \(\pi_0\geq \pi_1\). So, she uses the following prior \(f(\pi_0, \pi_1) \propto \mathbb{1}(\pi_0\geq \pi_1).\) Comment. (Use \(\lesssim 50\) words.)
Hints:
  1. You may define a prior for \((\pi_0,\pi_1)\) so that \(\pi_0\) and \(\pi_1\) are independent.
    1. Using the independence of \(\pi_0\) and \(\pi_1\), you can derive the conjugate prior individually.
    2. This part illustrates the power of Bayesian analysis. Data based on humans and animals are likely to be generated from different mechanisms, however, it is still natural to approximate one mechanism by the other. So, you may construct prior based on the animals dataset to reflect your belief. Some possible methods are suggested below:
      • For \(j=0,1\), construct a conjugate prior for \(\pi_j\) with hyper-parameters selected according to their interpretation in Example 2.13 of the lecture note.
      • For \(j=0,1\), define \(\pi_j \sim \text{Beta}(\alpha^{(j)},\beta^{(j)})\), where \(\alpha^{(j)}\) and \(\beta^{(j)}\) are selected so that \[\begin{aligned} \mathsf{E}(\pi_j) = \widehat{\pi}_j + (\text{Bias} ) \qquad \text{and} \qquad {\mathsf{Var}}(\pi_j) = \widehat{\sigma}^2_j \times (\text{Inflation factor}), \end{aligned}\] where \(\widehat{\pi}_j\) is a frequentist estimator of \(\pi_j\), and \(\widehat{\sigma}^2_j\) is a frequentist estimator of \({\mathsf{Var}}(\widehat{\pi}_j)\) as a proxy of \({\mathsf{Var}}(\pi_j)\). If you firmly believe your location estimate, then you may set the bias to be zero and set the inflation factor to one. You may use the fact that if \(\xi \sim \text{Beta}(\alpha, \beta)\) satisfy \(\mathsf{E}(\xi)=\mu\) and \({\mathsf{Var}}(\xi)=\sigma^2\) then \[\alpha = \frac{\mu^2(1-\mu)}{\sigma^2}-\mu \qquad \text{and} \qquad \beta=\alpha\left(\frac{1}{\mu}-1 \right).\]
    3. You may use Jeffereys prior on each parameter.
    4. Is there any hardly denying information on \(\pi_0\) and \(\pi_1\)?
  2. What is a proper way to specify an informative prior?
"
3.6
Easy Difficult    Number of votes: 8
May I ask why centering the priors at 0.8 and 0.2 is not a strong belief? It seems not hardly denying
 Anonymous Mink    Created at: 2024-03-01 17:23  1 
Show 1 reply
 Anonymous Hippo    Created at: 2024-03-29 16:24  0 
I guess only assuming the support is a strong belief, but adjusting hyperparameters to show our prior belief of centering situation is not strong.
Example 2.2
  [TA] Chak Ming (Martin), Lee    Created at: 0000-00-00 00:00   Chp2Eg2 3 
Example 2.2 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Transforming a location parameter to a scale parameter). Invariant priors for location and scale parameters in Theorem 2.1 do NOT contradict with each other. Indeed, we can derive the less intuitive result (2) by the more trivial result (1). Let \([x\mid \theta] \sim f(\cdot\mid \theta)= f_0(\cdot/\theta)/\theta\) as in (\ref{eqt:scaleFamily}). Then we can represent \(x\) as \[x = \theta z,\] where \(z \sim f_0(\cdot)\). It implies than \[\log x = \log z + \log\theta,\] which can be viewed as a location model for \(y =\log x\) with a location parameter \(\phi = \log\theta\). By Theorem 2.1 (1), the invariant prior of \(\phi\) is \[\begin{align}\label{eqt:prior_phi_in_eg} f_{\phi}(\phi) \propto 1. \tag{2.4}\end{align}\] Using the transformation \(\phi = \log \theta\), we can derive back the PDF of \(\theta\) from (\ref{eqt:prior_phi_in_eg}). Since \(\text{d}\phi/\text{d}\theta = 1/\theta\), we have \[f_{\theta}(\theta) \;\propto\; \frac{1}{\theta} f_{\phi}(\log\theta) \;\propto\; \frac{1}{\theta},\] which gives us back part (2) of Theorem 2.1.\(\;\blacksquare\)
3.8
Easy Difficult    Number of votes: 5
how to represent x
 Anonymous Warbler    Created at: 2023-02-16 16:06  1 
I still cannot fully understand why 𝑥=𝜃𝑧
z~f0(·), [x|𝜃]~f0(·/𝜃)/𝜃
if x=𝜃z, then x~𝜃f0(·)? But how to interpret [x|𝜃]~f0(·/𝜃)/𝜃?

Show reply
Transformation
 Anonymous Pumpkin    Last Modified: 2024-03-29 12:03  1 
Why $\displaystyle\frac{1}{\theta}f_{\phi}(\log\theta)\propto\frac{1}{\theta}$?
$\log\theta$ also contains $\theta$?
Show reply
About Chapter 1
  [Developer] Kai Pan (Ben) Chu    Created at: 0000-00-00 00:00   Chp1General 2 
Question / Discussion that is related to Chapter 1 should be placed here.
3
Easy Difficult    Number of votes: 4
meaning of Scissors
 Anonymous Rabbit    Created at: 2022-01-22 00:00  1 
In the text note, the symbol ofScissors appears a lot of time.
what is the meaning of this symbol?
Show 1 reply
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-01-23 12:00  1 
Optional materials are indicated by scissors. For more information about the symbols, please refer to the instruction of the lecture note on the course website.
ancillary statistic
 Anonymous Loris    Created at: 2022-01-26 00:00  2 
Is there any difference between pivotal quantity andancillary statistic (mentioned in tutorial???
Show 2 reply
  [TA] Di Su    Created at: 2022-01-26 12:00  4 
A pivotal quantity may not be a statistic (a function of data) because it may involve unknown parameters. If a pivotal quantity is indeed a statistic, then it is called an ancillary statistic.
For example, suppose $X_1,\dots,X_n\overset{iid}{\sim} N(\theta,1)$ where $\theta\in\mathbb{R}$ is an unknown parameter. Using representation, $X_1-\theta=Z_1$ for some $Z_1\sim N(0,1)$ whose distribution is independent of $\theta$, hence the quantity $X_1-\theta$ is a pivotal quantity. However, $X_1-\theta$ is not a statistic since $\theta$ is unknown, hence not an ancillary statistic. On the other hand, the quantity $X_1-\bar{X}$ is pivotal and free of the unknowns, hence an ancillary statistic.
  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-01-26 12:00  2 
Di explained very well! If you wish to know more about ancillary statisticsand pivotal quantity, you may refer to my Stat4003 lecture notes (see here):
You are kindly remindedthat these concepts will not be tested directly in Stat4010. But it is always a good idea to learn more Statistics!
Posterior as prior (2021 spring midterm Q1)
 Anonymous Orangutan    Created at: 2022-02-27 11:00  5 
To derive posterior predictive, we can use the posterior distribution as prior and plug into prior predictive.
Here is my question. Why $p_j$ is still used not $p_j^{(n)}$ in this case?



Show 1 reply
  [TA] Di Su    Created at: 2022-03-01 14:37  0 
Thanks for pointing it out. Yes, it should be $p_j^{(n)}$.
About range of parameter
 Anonymous Loris    Last Modified: 2022-04-28 01:19  2 
I notice that when updating our belief in $\theta$ using observed data, the range of our parameter in prior belief will never be changed.
Algebraically, this is because the indicator can not be discarded when doing calculations (i.e. posteior $\propto$ prior $\times$ sampling dist).
I am wondering if there is an intuitive way to understand that our belief in the range of $\theta$ cannot be modified by data?
Thank you very much!
Show 3 reply
 Anonymous Ifrit    Last Modified: 2022-04-28 01:13  2 
I think below is the rough idea algebraically. Posterior is obtained by multiplying prior and sampling distribution (and normalizing constant). As the values of the prior out of the range of parameter are 0, no matter what we multiply what values of sampling distribution to those values in the prior, the resulting values of the posterior out of the range are still 0 (0 multiply by any value).
 Anonymous Loris    Last Modified: 2022-04-28 13:36  2 
Yes I think this explains the reason from the algrebrical perspective. Thanks a lot:)
From my point of view, intuitively, this kind of shows one limitation of Bayesian. That is, we must correctly specify the support of parameter in prior. Otherwise, we may fail to cover the ture model.
I am not sure if this explanation is good. Maybe there will be better intuitions:)
 Anonymous Ifrit    Last Modified: 2022-04-29 15:53  3 
In my opinion, it is not the limitation of Bayesian itself, because Bayesian does not require that we must specify the (bounded) support of parameter in prior. If we are afraid that we cannot specify the correct support of the distribution of the parameter, then why don't we just let the prior distribution has unbounded support? For example, if we are very sure that theta fall between 0 to 1, but to play safe, we don't want to eliminate the possibility that theta falls out of the range of 0 to 100. Then, we could construct a prior like this:
f(theta) = 0.99999 for 0 < theta < 1, and 0.000000000000001 otherwise (for -infinity < theta < 0 or 1 < theta < infinity).
So the prior keep our strong belief, while does not eliminate the possibility that theta falls out of the range.
After incorporating the data to compute the posterior, the data can somehow correct us, if our belief is wrong (for example, see the solution of the optional part of A2). I think this type of prior (with unbounded support and highly concentrated at some region(s) while having extremely low densities in all other regions) shows the full flexibility of Bayesian.
So in this sense, having bounded support in prior is actually quite a strong belief? I think unless we are 100% sure that thetamust not fall into some region, having a prior with unbounded support is a safer choice.

Frequentist vs Bayesian
 Anonymous Auroch    Created at: 2024-01-26 16:52  4 
I remembered in the lecture notes it says that frequentists' point of view can be understood as Bayesian with a very strong prior belief, and that the Bayesian calculations will not be meaningful as the posterior will be equal to the prior. I wonder if the roles can be reversed. For instance, can we set up a null hypothesis with a variable theta and test using frequentists' philosophy? (Eg. theta is between 0.4-0.6 uniformly distributed). I know that we are taught that assuming theta is variable is already known as Bayesian, but can we still do some frequentist-like calculations?
Show reply
Constant Prior
 Anonymous Mink    Last Modified: 2024-01-31 22:19  3 
Can I say the Prior is the true statement if after observed data for any time Prior still keep constant as the Posterior?
Show reply
Prior Predictive
 Anonymous Pumpkin    Created at: 2024-02-29 22:10  0 
Why in lecture notes Chapter 1 Page 3, prior predictive is defined as $f(x_{1:n})$, but in handwritten notes, it is defined as $f(x_{n+1})$?
Show reply
Example 1.8
  [TA] Chak Ming (Martin), Lee    Created at: 0000-00-00 00:00   Chp1Eg8 2 
Example 1.8 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Hypothesis testing ). Let \([x_1,\ldots ,x_n\mid \theta] \overset{ \text{iid}}{\sim} \text{Bin}(N,\theta)\) with known \(N\). Test \[H_0: \theta\leq 0.5\qquad \text{against}\qquad H_1:\theta>0.5.\]

$\bigstar~$Solution:

  1. (Bayesian) Assume a continuous prior on \(\theta\).
    • Our prior belief on the truthfulness of \(H_0\) and \(H_1\) are respectively \[\begin{align} \mathsf{P}(H_0) &:=& \mathsf{P}(\theta\leq 0.5) = \int_0^{0.5} \texttt{dbeta}(\theta\mid\alpha,\beta)\, \text{d}\theta\\ \mathsf{P}(H_1) &:=& \mathsf{P}(\theta> 0.5) = \int_{0.5}^{1} \texttt{dbeta}(\theta\mid\alpha,\beta)\, \text{d}\theta, \end{align}\] where \(\texttt{dbeta}(\cdot\mid\alpha,\beta)\) is the density of \(\text{Beta}(\alpha,\beta)\).
    • Our posterior belief on the truthfulness of \(H_0\) and \(H_1\) are updated to \[\begin{align} \mathsf{P}(H_0 \mid x_{1:n}) &=& \mathsf{P}(\theta\leq 0.5\mid x_{1:n}) = \int_0^{0.5} \texttt{dbeta}(\theta,\alpha_n,\beta_n)\, \text{d}\theta; \\ \mathsf{P}(H_1 \mid x_{1:n}) &=& \mathsf{P}(\theta> 0.5\mid x_{1:n}) = \int_{0.5}^1 \texttt{dbeta}(\theta,\alpha_n,\beta_n)\, \text{d}\theta , \end{align}\]
    • Then we may reject \(H_0\) if \(\mathsf{P}(H_1 \mid x_{1:n})/\mathsf{P}(H_0 \mid x_{1:n})\) is large compared with \(\mathsf{P}(H_1)/\mathsf{P}(H_0)\), i.e., \[\text{Reject $H_0$} \qquad \Leftrightarrow \qquad B_{10} := \frac{\mathsf{P}(H_1 \mid x_{1:n})}{\mathsf{P}(H_0 \mid x_{1:n})} \bigg/ \frac{\mathsf{P}(H_1)}{\mathsf{P}(H_0)} \text{ is large}.\] The value \(B_{10}\) is called the Bayes factor (BF) serving as a “test statistic”. Note that the BF is just a one-to-one transformation of \(\mathsf{P}(H_0 \mid x_{1:n})\).
  2. (Frequentist) Suppose that \[\sqrt{n}(\widetilde{\theta}-\theta)/\widetilde{\sigma} \overset{ \text{d} }{\rightarrow} \text{N}(0,1),\] where \(\widetilde{\theta}\) is an estimator of \(\theta\), and \(\widetilde{\sigma}^2\) is an estimator of \({\mathsf{Var}}(\widetilde{\theta})\). Then our test decision follows the rule: \[\text{Reject $H_0$} \qquad \Leftrightarrow \qquad T := \frac{\widetilde{\theta}-0.5}{\widetilde{\sigma}} \text{ is large}.\] Alternatively, one may compute the \(p\)-value \[p := \mathsf{P}_0\left( T > c \right),\] where \(\mathsf{P}_0\) is the probability measure under \(H_0\) and \(c\) is a critical value. In this case, \(H_0\) is rejected if \(p\) is small. \(\;\blacksquare\)


$\bigstar~$Intuition:

PhilosophyStatistic for testing \(H_0\)Interpretation
Bayesian\(\mathsf{P}(H_0\mid x_{1:n})\)the posterior probability of \(H_0\) is true
Frequentist\(\mathsf{P}\{ T(x_{1:n})>c \mid H_0\}\)the \(p\)-value — the prob. of observing extreme events under \(H_0\)


$\bigstar~$Takeaway:
Bayesian test is intuitive — evaluation of the probability of \(H_0\) is true given data.

$\bigstar~$Experiment:
If \(\theta\sim \text{Beta}(\alpha, \beta)\), the Bayes factor can be computed in R as follows.
S = sum(x)                            
alpha1 = alpha + S                    
beta1  = beta + n*N - S               
odds0 = (1-pbeta(0.5,alpha,beta))/pbeta(0.5,alpha,beta)
odds1 = (1-pbeta(0.5,alpha1,beta1))/pbeta(0.5,alpha1,beta1)
odds1/odds0                # Bayes factor B10

4
Easy Difficult    Number of votes: 2
Bayes Factor
 Anonymous Pumpkin    Created at: 2024-02-29 22:07  0 
  1. What is the intuition behind $P(H_1|x_{1:n})/ P(H_0|x_{1:n})$ is large compared with $P(H_1)/P(H_0)$. Why are we comparing these two?
  2. What does BF is just a one-to-one transformation of $P(H_0|x_{1:n})$ mean?

Show reply
Example 3.3
  [TA] Chak Ming (Martin), Lee    Created at: 0000-00-00 00:00   Chp3Eg3 0 
Example 3.3 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Regression). Suppose that the covariates \(x_1,\ldots, x_n\) be fixed and known. Let \[\begin{align}& \overset{ {\perp\!\!\!\!\perp } }{\sim} & \text{N}(\theta x_i , \sigma_0^2), \qquad i=1, \ldots, n;\\ \theta &\sim&\text{N}(\theta_0, \tau_0^2), \end{align}\] where \(\sigma_0, \tau_0>0\) and \(\theta_0\in\mathbb{R}\) are given. Then the posterior of \(\theta\) is given by \[\begin{align} f(\theta\mid y_{1:n}) &\propto& f(y_{1:n}\mid \theta) f(\theta) \\ &\propto& \exp\left\{ - \frac{1}{2\tau_0^2} (\theta-\theta_0)^2\right\} \prod_{i=1}^n \exp\left\{ -\frac{1}{2\sigma_0^2}(y_i - x_i \theta)^2 \right\} \\ &\propto& \exp\left\{ -\frac{1}{2}\left[ \left( \frac{1}{\tau_0^2} + \frac{\sum x_i^2}{\sigma_0^2} \right)\theta^2 - 2\left( \frac{\theta_0}{\tau_0^2} + \frac{\sum x_i y_i }{\sigma_0^2} \right)\theta\right] \right\}.\end{align}\] By Lemma 1.2, we know that \[\left[ \theta\mid y_{1:n} \right] \sim \text{N}\left( \frac{B}{A}, \frac{1}{A}\right), \qquad \text{where} \qquad A = \frac{1}{\tau_0^2} + \frac{\sum x_i^2}{\sigma_0^2} \qquad \text{and} \qquad B = \frac{\theta_0}{\tau_0^2} + \frac{\sum x_i y_i }{\sigma_0^2}.\] The MAP estimator of \(\theta\) is \[\widehat{\theta}_{MAP} = \frac{B}{A} = \frac{\theta_0({\sigma_0^2}/{\tau_0^2}) + {\sum x_i y_i }}{({\sigma_0^2}/{\tau_0^2}) + {\sum x_i^2} }. \tag*{$\blacksquare$}\]
Easy Difficult    Number of votes: 0
why [𝜃∣𝑦1:𝑛]∼N(𝐵/𝐴,1/𝐴)
 Anonymous Armadillo    Created at: 2024-02-28 22:07  0 
For [𝜃∣𝑦1:𝑛]∼N(𝐵/𝐴,1/𝐴), I would like to ask that why the mean is B/A and the variance is 1/A. Thank you!
Show 1 reply
 Anonymous Pumpkin    Created at: 2024-03-29 10:36  0 
Please refer to Chapter 1, Page 10, Lemma 1.2

Apply tag, filter or search to load more result