Exercise 4.1
  [TA] Di Su    Created at: 0000-00-00 00:00   A4Ex4.1 3 
Related last year's exercise and discussion can be found here.
Exercise 1 (Testing and region estimation (60%)). Let \[\begin{aligned} & \overset{ \text{iid}}{\sim} \text{N}(\theta, \theta^2) , \\ \theta &\sim \theta_0\text{Exp}(1),\end{aligned}\] where \(\theta_0=0.5\). Suppose the dataset A4Q1.csv is observed. The goal is to perform inference on \(\theta\).
  1. (10%) Derive and plot the posterior density of \([\theta\mid x_{1:n}]\).
  2. (10%) Compute the maximum a posteriori (MAP) estimator of \(\theta\). Visualize it on the plot.
  3. (10%) Compute the highest posterior density (HPD) credible region (CR) \(\theta\). Visualize it on the plot.
  4. (10%) Compute the confidence estimator \(\widehat{\alpha}\) under the loss in Example 5.6 with \(I=(1.1, 1.3)\).
  5. (10%) Consider testing \[H_0: \theta \leq 1.2\qquad \text{and} \qquad H_1: \theta>1.2.\] Under the loss in Theorem 4.1 with \(a_0 =1\) and \(a_1=99\), compute the Bayes solution. What is your conclusion?
  6. (10%) Consider testing \[H_0: \theta = 1.2\qquad \text{and} \qquad H_1: \theta\neq 1.2.\] Modify the prior of \(\theta\). Then compute the Bayes factor. What is your conclusion?
Easy Difficult    Number of votes: 3
question 1
 Anonymous Gopher    Created at: 2023-03-24 14:52  0 
In the first question, I fail to find a named distribution of the posterior density, can I just leave the kernel as the final answer? Thank you!
Show reply
About Chapter 3
  [Developer] Kai Pan (Ben) Chu    Created at: 0000-00-00 00:00   Chp3General 0 
Question / Discussion that is related to Chapter 3 should be placed here.
Easy Difficult    Number of votes: 0
Tutorial 4 Example 2.2
 Anonymous Orangutan    Created at: 2022-02-17 17:39  1 
I have a problem to understand the step of standardization.

why $E(\theta_j|y_{1:n}) = \mu_j -\sigma_j \frac{dnorm(u_j)-dnorm(l_j)}{pnorm(u_j)-pnorm(l_j)}$ ?
1 why we cannnot simply use $\mu_j$
2 why this term is used for the standardized process? (u_j and l_j aren't understandable) $\frac{dnorm(u_j)-dnorm(l_j)}{pnorm(u_j)-pnorm(l_j)}$

Sorry for my abstract question. Thanks
Show 3 reply
  [TA] Di Su    Last Modified: 2022-02-18 17:48  2 
Thanks for your questions. It comes from Tutorial 4 Example 2.2 Q2. (Please kindly mention which question you are referring to so that others can also take a look :) )
  1. We cannot directly use $\mu_j,j=1,2$ because $\hat{\theta}^{\pi}_j$ follows a $\href{https://en.wikipedia.org/wiki/Truncated_normal_distribution}{\text{truncated normal distribution}}$ instead of a normal distribution.
  2. It is the property of truncated normal distribution. Compared to a normal distribution, an upper bound and a lower bound are imposed to the support of the truncated normal distribution. And $u_j,\ell_j$ are understood as the z-scores of its upper bound and lower bound respectively.
 Anonymous Monkey    Created at: 2022-03-11 13:48  1 
So how can we arrive at that equation? Not quite sure how to deal with it to get the pnorm formula.
  [TA] Di Su    Created at: 2022-03-23 21:50  3 
Let's use TN$(\mu,\sigma^2;a,b)$ to denote the truncated normal distribution that restricts $N(\mu,\sigma^2)$ on $[a,b]$. Assume $X\sim$ TN$(\mu,\sigma^2;a,b)$. Then $f(x)\propto\phi(x;\mu,\sigma^2),x\in[a,b]$ where $\phi(\cdot;\mu,\sigma^2)$ denotes the density of a normal random variable with mean $\mu$ and variance $\sigma^2$. We have the following observations:
  1. Denote the standard normal density as $\phi(\cdot)$. The density of $X$ is given by (exercise) $$f_X(x) = \frac{\phi((x-\mu)/\sigma)}{\{\texttt{pnorm((b-$\mu$)/$\sigma$)}-\texttt{pnorm((a-$\mu$)/$\sigma$)}\}\sigma},\quad x\in[a,b].$$
  2. Let $Z:=(X-\mu)/\sigma.$ Then $Z\sim\mathrm{TN}(0,1;\alpha,\beta)$ where $\alpha = (a-\mu)/\sigma, \beta =(b-\mu)/\sigma$. And $$f_Z(z) = \frac{\phi(z)}{\texttt{pnorm($\beta$)}-\texttt{pnorm($\alpha$)}},\quad z\in[\alpha,\beta].$$
  3. The expectation of $Z$ is $$E(Z) = \int_{\alpha}^{\beta}\frac{z\phi(z)}{\texttt{pnorm($\beta$)}-\texttt{pnorm($\alpha$)}}\mathrm{d}z=\frac{\int_{\alpha}^{\beta}\frac{z\exp(-z^2/2)}{\sqrt{2\pi}}\mathrm{d}z}{\texttt{pnorm($\beta$)}-\texttt{pnorm($\alpha$)}}=\frac{\int_{\alpha^2/2}^{\beta^2/2}\frac{\exp(-u)}{\sqrt{2\pi}}\mathrm{d}u}{\texttt{pnorm($\beta$)}-\texttt{pnorm($\alpha$)}}=\frac{\texttt{dnorm($\alpha$)}-\texttt{dnorm($\beta$)}}{\texttt{pnorm($\beta$)}-\texttt{pnorm($\alpha$)}}.$$
Therefore, $$E(X) = \sigma E(Z)+\mu=\sigma\frac{\texttt{dnorm($\alpha$)}-\texttt{dnorm($\beta$)}}{\texttt{pnorm($\beta$)}-\texttt{pnorm($\alpha$)}}+\mu.$$
Questions Regarding Tutorial 5
 Anonymous Grizzly    Last Modified: 2022-02-27 22:04  1 
$\textbf{Proof of Theorem 1.3}$
$$\text{R}(\pi,\widehat{\theta}_\pi)=\int_\mathscr{X}\int_\Theta \left[\widehat{\theta}_\pi-\theta\right]^2 dF(\theta\mid x)dF(x)$$
Why can we write $dF(\theta\mid x) dF(x)$? Is it same as $dF(x\mid\theta)dF(\theta)$?

$\textbf{Example 1.1}$
$$\widehat{\theta}_\pi=\text{E}[\theta\mid x_{1:n}]=\frac{\tau_0^2}{\sigma^2+\tau_0^2}\overline{x}+\frac{\sigma^2}{\sigma^2+\tau_0^2}\nu_0$$
Why is n missing in the above expression?

$\textbf{Example 1.2}$
$$\rho(a,b)\geq\rho\left(0,-b/(a-1)\right) \text{for a<0}$$
$$\therefore a\overline{x}+b\quad\text{is uniformly dominated by −b/(a−1) for a<0.}$$
If we have stated a=0 in $\rho\left(0,-b/(a-1)\right)$, is -b/(a-1)=-b/(0-1)=b? Therefore, $a\overline{x}+b\quad\text{is uniformly dominated by b for a<0.}$ Is that correct?

Thank you very much.

Show 1 reply
  [TA] Cheuk Hin (Andy) Cheng    Created at: 2022-02-27 23:20  3 
Thanks for the questions.
  1. We write in that form because the densities may not exist. If the joint density, marginal densities as well as the conditional densities exist, we can exchange the order. We consider the posterior first in the inner integral first for computational convenience. Note that under L2 loss, $\hat{\theta}_\pi = E[\theta \mid x_{1:n}]$. Therefore, the inner integral equals to $Var(\theta \mid x_{1:n})$. By our assumption, it is free of $x_{1:n}$. Therefore, we can take it out of the outer integral and finish all the calculation easier.
  2. You are correct. Therefore is a typo in the equation. But the conclusion is the same. We will update the tutorial note shortly. Thanks for pointing out.
  3. Here, $a$ cannot be $0$ since $a < 0$ is considered in this scenario. The subsequent argument is thus not valid.
Mock 2021 spring Q3
 Anonymous Orangutan    Created at: 2022-03-08 19:04  0 
How can we derive A2 and A3?
Although I can prove this representation is correct by looking at answer, I can't come up with this idea.

Show 2 reply
 Anonymous Orangutan    Created at: 2022-03-09 18:30  1 
I asked in tutorial and tried, but still couldn't get the same form.
Please explain again or write down the procedure.
  [TA] Di Su    Last Modified: 2022-03-09 22:04  1 
Consider the two steps:
  1. write $\left(x_{(n)}-x_{(1)}\right)^{3-n}$ as $\left(x_{(n)}-x_{(1)}\right)/\left(x_{(n)}-x_{(1)}\right)^{n-2}$. And write other terms similarly.
  2. Multiply both the numeritor and the denominator by $(x_{(n)}-x_{(1)})^{n-2}$.
You will be able to handle the algebra then.
Q3 spring 2021 spring
 Anonymous Orangutan    Created at: 2022-03-09 09:54  1 
I tried to simulate the answer, but couldn't get the same answer.
1 Where are my code mistakes ?
2 some researchers find $\theta$ is always positive. Then,
Is new prior 1(0<$\theta_0$ < $\theta_1$ ) reasonable to mitigate strong previous prior ?Is
nRep = 2^10 
t1.all = seq(0,1.2,0.1)
t0.all = seq(-0.2,1.2, 0.01) #X-axis
## A fucnction to compute the risk
get_FR_dif = function(x,t1, t0){
  max =max(x)
  min = min(x)
  MLE = max-min
  A1 = (n-1)/(n-3)
  A2 = (max-min)^(n-2) - (1-min)*((max-min)/(1-min))^(n-2)-max*((max-min)/max)^(n-2)
  A3 = (max-min)^(n-2) - ((max-min)/(1-min))^(n-2)-((max-min)/max)^(n-2)
  BE = A1 * ((max-min) + A2) / (1+A3)
  MLE_risk = (MLE-(t1-t0))^2
  BE_risk = (BE-(t1-t0))^2
  Risk_dif = MLE_risk-BE_risk

# Simulation 
out = array(NA,dim=c(nRep, length(t0.all), length(t1.all))) 
dimnames(out) = list(paste0("iRep=",1:nRep),paste0("theta=",t0.all),paste0("theta1=",t1.all)) 
for(iRep in 1:nRep) {
  for(i.t0 in 1:length(t0.all)){
    theta0 = t0.all[i.t0] # ture theta0
    for(i.t1 in 1:length(t1.all)){
      theta1 = t1.all[i.t1] # true theta 1
      x = runif(1000,min=theta0, max=theta1)
      out[iRep, i.t0, i.t1] = get_FR_dif(x, t1=theta1, t0=theta0)
# Results 
risk = apply(out, 2:3, mean) 
risk_adj = log(risk)
col = colorRampPalette(c("red","blue"))(length(t1.all))
matplot(t0.all, risk, type="b", pch=20, main="Risk", xlab=expression(theta))
Show 1 reply
  [TA] Di Su    Last Modified: 2022-03-23 22:33  1 
2021 Spring Mock Midterm Q3.3
The for loop of $\texttt{i.t1}$ is not correct. It violates the requirement that $\theta_0<\theta_1$. You may modify that loop as
 for(i.t1 in which(t1.all>theta0)[1]:length(t1.all)){
(There are also other ways to write the for loop.)
It is a good practice to try the codes yourself, keep up!
2019 M0 BE
 Anonymous Orangutan    Created at: 2022-03-10 07:52  0 
I am not sure why IV is correct.
Bayes estimators in this question are 2,3 and 6?

Show 1 reply
  [TA] Di Su    Last Modified: 2022-03-10 11:42  1 
  • The other estimators will always have a larger Bayesian risk because each of them is inadmissible.
  • In this problem, we have $R(\pi,\hat{\theta})=\pi(\theta_1)R(\theta_1,\hat{\theta})+\pi(\theta_2)R(\theta_2,\hat{\theta})$.
    • For $\hat{\theta}_2$, it is the minimizer of $R(\theta_1,\hat{\theta})$, so we can design $\pi(\theta_1)=1, \pi(\theta_2)=0$ and $\hat{\theta_2}$ will be the corresponding Bayes estimator.
    • The same idea is applicable to $\hat{\theta}_3$.
    • Noticing that $\sum_{i=1}^2R(\theta_i,\hat{\theta}_2)=\sum_{i=1}^2R(\theta_i,\hat{\theta}_6)<\sum_{i=1}^2R(\theta_i,\hat{\theta}_3)$, we can design $\pi(\theta_1)=½, \pi(\theta_2)=½,$ and $\hat{\theta}_2,\hat{\theta}_6$ are the corresponding BEs.
Cramér–Rao bound, Sufficiency and Completeness
 Anonymous Moose    Created at: 2022-03-28 15:46  1 
In STAT4003, we have learnt Cramér–Rao bound, Sufficiency and Completeness to assess the quality of an estimator or in general a statistic. But in this course we mainly focus on the decision theory. How do those concepts relate to each other?
In particular, I found that the concept of completeness is really confusing. Why do we need such definition in practice?
Show 1 reply
  [TA] Cheuk Hin (Andy) Cheng    Last Modified: 2022-03-29 00:13  2 
This question is more likely from S4003. Since it is out of syllabus, I answer it in short.
First of all, completeness is a property of a statistic in relation to a family of probability distributions. It roughly says that no non-trivial function of statistic have mean 0 unless the function is 0. Say our parameter of interest is $\theta$. One motivation of considering the complete sufficient statistic (CSS) is the fact that minimum sufficient statistic (MSS) (the simplest form of sufficient statistic SS) is not necessarily independent to ancillary statistic (AS, a statistic that contain no information of $\theta$). This is counter intuitive because we would expect sufficient statistic contains “all” information about $\theta$ and thus independent to ancillary statistic. Indeed we need stronger condition for MSS so that it would be independent to AS. So, the completeness stands out (take a look on the Basu theorem). Note also CSS implies MSS under very mild assumption. Some people also view completeness as a tool or a stronger assumption for proving stronger result for the (sufficient) statistic they are interested.

I think completeness can be related to the decision theory. Recall that under the framework, Frequentist makes decision that minimizes the risk. One usage of the completeness together with the decision theory would be the Lehmann-Scheffe theorem. If $T(x)$ is CSS and the unbiased statistic $g(x)$ depends on the data only through $T(x)$, then $g(x)$ is UMRUE. Here, we have used the decision theory in the following sense: we specify our utility or loss, then we make decision that minimizes the expected loss or risk (in above case the convex loss with a very large penalty on bias). The Lehmann-Scheffe theorem is also an example that why we need to care about CSS in practice, since it tells us the “best” point estimator.
About Invariance
 Anonymous Bobolink    Created at: 2023-02-26 00:37  0 
We learned the definition of invariance and talked about how to make an invariant prior. It always means the prior still be the same under reparametrization. But till now, we haven't ever used reparametrization and I don't know in which situations we should use the reparametrization. So could I ask what're the practical benefits of the invariance?
Show reply
Tutorial 5 Example 2.1
 Anonymous Dolphin    Created at: 2023-03-20 21:23  0 
To prove bayes estimator is minimax.
Can I prove by claiming the Frequentists Risk is constant over all theta?
Given a prior and the corresponding Bayesian Estimator
The assumption of Bayes Risk is always larger than or equal to Frequentist Risk for any theta is suggesting Frequentist Risk is constant
Because Bayes Risk is average of Frequentist Risk with respect to theta
If the average is always large than or equal to all possible outcome of the Random Variable, then the Random Variable has to be constant
Show reply
Exercise 3.2
  [TA] Cheuk Hin (Andy) Cheng    Created at: 0000-00-00 00:00   A3Ex3.2 0 
Let \([x \mid \theta]\) follow any sampling distribution with any prior for \(\theta\). Denote \(\phi_j = g_j(\theta)\) for \(j=1, \ldots, J\), where \(g_1, \ldots, g_J\) are some known functions and \(J\in\mathbb{N}\) is fixed. Suppose that, under the loss function \(L(\theta, \widehat{\theta}) = (\widehat{\theta} - \theta )^2/\theta^2 ,\) the Bayes estimator of \(\phi_j\) is \(\widehat{\phi}_j\) for each \(j\). The parameter of interest is \[\phi := \sum_{j=1}^J w_j \phi_j, \] where \(w_1, \ldots, w_J\in\mathbb{R}\) are known and fixed. Derive the Bayes estimator of \(\phi\).
Hints: See Remark 2.2. Don’t read the hints unless you have no ideas and have tried for more than 15 mins.
Easy Difficult    Number of votes: 2
Loss function
 Anonymous Pinniped    Last Modified: 2023-03-03 01:44  1 
May I ask whether $L(\Phi,\hat{\Phi}) = (\hat{\Phi} - \Phi)^2/\Phi^2$ ? Thank you.
Show 1 reply
  [Instructor] Kin Wai (Keith) Chan    Created at: 2023-03-03 10:57  0 
Good catch. You are right. For any function $h$, the loss for estimating $\xi = h(\theta)$ by $\widehat{\xi}$ is
L(\theta, \widehat{\xi}) = (\widehat{\xi}-\xi)^2/\xi^2.
Note that I always put the full model parameter $\theta$ in the first argument to emphasize the dependence on $\theta$.
Exercise 3.1 (P6-10)
  [TA] Cheuk Hin (Andy) Cheng    Created at: 0000-00-00 00:00   A3Ex3.1 0 
Consider an online platform that sells secondhand goods. Let \(y\) be the known official price of a brand new product ABC. On the platform, there are \(n\) secondhand items of product ABC having a similar condition (e.g., “like new”). Denote their prices by \(x_1, \ldots, x_n\). Assume the following model: \[\begin{aligned} \left[ x_1, \ldots, x_n \mid \theta \right] &\overset{\text{IID}}{\sim} Ga(v_0)/\theta, \end{aligned}\] where \(v_0>0\) is fixed. Assume that we have a prior belief that \(\theta\) can only take values \(t_1, \ldots, t_J\) with probability \(p_1, \ldots, p_J\), where \(t_1, \ldots, t_J, p_1, \ldots, p_J>0\) and \(J>1\) are fixed numbers such that \(p_1 + \cdots + p_J = 1\). Set \(v_0 = 1.4\) and \(J=10\). For each \(j=1, \ldots, J\), let \[t_j = \frac{j}{1000 J} \qquad \text{and} \qquad p_j \propto \left\{ 0.8 - \left\vert \frac{j}{J}-0.8 \right\vert \right\}, \qquad j=1, \ldots, J. \] Suppose that \(y=28888\) and a dataset (A3.txt) of size \(n=80\) is observed. We are interested in \[\phi = \frac{E(x_1\mid \theta)-y}{y}. %\vspace{-0.1cm}\]
6. (10%) Write an R function BE(x, q=0, t, p, v0) to compute \(\widehat{\theta}_{q}\), where the inputs are as follows:
x — a vector storing \((x_1, \ldots, x_n)\).
q — the order \(q\) of the loss function. By default, set q=0.
t — a vector storing \((t_1, \ldots, t_J)\).
p — a vector storing \((p_1, \ldots, p_J)\).
v0 — the value of the hyperparameter \(v_0\).
7. (10%) In parts (7)–(9), consider the loss \(L_1(\theta, \widehat{\theta})\). Let \(\mathcal{S} = \{\widehat{\theta}_{0},\widehat{\theta}_{1},\ldots, \widehat{\theta}_{4}\}\) be the set of estimators of interest. Use simulation to plot the risk functions of \(\widehat{\theta}_{0},\widehat{\theta}_{1},\ldots, \widehat{\theta}_{4}\) against \(\theta\) on the same graph.
8. (10%) Using the graph in part (7), determine whether the estimators in \(\mathcal{S}\) are admissible.
9. (10%) Using the graph in part (7), find the minimax estimator of \(\theta\) among \(\mathcal{S}\).
10. (10%) The above prior is discrete with support \(\{t_1, \ldots, t_J\}\). Andy claimed that it is not reasonable. Instead, he modified the prior so that \(\theta\) follows \(t_j Ga(\tau_0)/\tau_0\) with probability \(p_j\) for each \(j=1, \ldots, J\), where \(\tau_0>0\) is fixed. Select an appropriate value of the hyperparameter \(\tau_0\). Using this prior, suggest an estimator of \(\phi\). Do you prefer this estimator or the estimator in part (4)? Why? (Use \(\lesssim 50\) words.)
Hints: See Remark 2.1. Don’t read the hints unless you have no ideas and have tried for more than 15 mins.
Easy Difficult    Number of votes: 6
Question 6
 Anonymous Bison    Created at: 2023-03-02 00:16  1 
I would like to ask for question 6, according to the hint about adding adjustment to the log(), do we need to adjust back. e.g., exp(log(theta) - min(theta)), how can we extract the exp(min(theta)) out? or is it negligible when we have the fraction of a expectation over expectation?
Show 2 reply
  [TA] Cheuk Hin (Andy) Cheng    Created at: 2023-03-02 09:11  1 
Observe that $\max_{\theta} p_0(\theta)$ is a constant free of $\theta$. Moreover, $b = exp(log(b))$. Therefore, the posterior proportional to $$f(\theta\mid X) \propto \exp\{\log(p_0(\theta)) - \log(\max_{\theta} p_0(\theta)) \}. $$ Therefore, the extra adjustment term can be absorbed into the normalizing constant when we compute the posterior probabilities.
  [Instructor] Kin Wai (Keith) Chan    Last Modified: 2023-03-02 13:04  2 
It is a good question. Let me explain in the following way:
  • (Before taking log) Recall that the posterior is given by
    f(\theta \mid x_{1:n}) = \color{red}{C} f(\theta) f(x_{1:n}\mid \theta).
    Hence, we can write
    f(\theta \mid x_{1:n}) &\propto \color{red}{C} f(\theta) f(x_{1:n}\mid \theta), \qquad \text{or}\\
    f(\theta \mid x_{1:n}) &\propto \color{red}{20 C} f(\theta) f(x_{1:n}\mid \theta), \qquad \text{or}\\
    f(\theta \mid x_{1:n}) &\propto \color{red}{4010 C} f(\theta) f(x_{1:n}\mid \theta), \qquad \text{or}\\
    All of them mean the same thing.
  • (After taking log) Now, taking log on both side, we get
    \log f(\theta \mid x_{1:n}) = \color{red}{\log C} + \log f(\theta) + \log f(x_{1:n}\mid \theta).
    We observe that, after taking log, the proportionality constant $C$ becomes an additive constant $\log C$. Hence, we can write
    \log f(\theta \mid x_{1:n}) &= \color{red}{\log C} + \log f(\theta) + \log f(x_{1:n}\mid \theta), \qquad \text{or}\\
    \log f(\theta \mid x_{1:n}) &= \color{red}{\logC-2023} + \log f(\theta) + \log f(x_{1:n}\mid \theta), \qquad \text{or}\\
    \log f(\theta \mid x_{1:n}) &= \color{red}{C-\max\left\{ 10,20,30,40,50\right\}} + \log f(\theta) + \log f(x_{1:n}\mid \theta), \qquad \text{or}\\
    where, on the last line, $\max\left\{ 10,20,30,40,50\right\}$ is just a part of the additive constant. Once again, all of them mean the same thing.
  • (Answer) In a nutshell, we don't need to adjust back the term $\max\{\cdots\}$ because it is a part of the “meaningless” additive constant. We include it only for facilitating computation.
Exercise 3.1 (P1-5)
  [TA] Cheuk Hin (Andy) Cheng    Created at: 0000-00-00 00:00   A3Ex3.1 1 
Consider an online platform that sells secondhand goods. Let \(y\) be the known official price of a brand new product ABC. On the platform, there are \(n\) secondhand items of product ABC having a similar condition (e.g., “like new”). Denote their prices by \(x_1, \ldots, x_n\). Assume the following model: \[\begin{aligned} \left[ x_1, \ldots, x_n \mid \theta \right] &\overset{\text{IID}}{\sim} Ga(v_0)/\theta, \end{aligned}\] where \(v_0>0\) is fixed. Assume that we have a prior belief that \(\theta\) can only take values \(t_1, \ldots, t_J\) with probability \(p_1, \ldots, p_J\), where \(t_1, \ldots, t_J, p_1, \ldots, p_J>0\) and \(J>1\) are fixed numbers such that \(p_1 + \cdots + p_J = 1\). Set \(v_0 = 1.4\) and \(J=10\). For each \(j=1, \ldots, J\), let \[t_j = \frac{j}{1000 J} \qquad \text{and} \qquad p_j \propto \left\{ 0.8 - \left\vert \frac{j}{J}-0.8 \right\vert \right\}, \qquad j=1, \ldots, J. \] Suppose that \(y=28888\) and a dataset (A3.txt) of size \(n=80\) is observed. We are interested in \[\phi = \frac{E(x_1\mid \theta)-y}{y}. %\vspace{-0.1cm}\]
  1. (10%) Write down the prior probability mass function of \(\theta\).
  2. (10%) Drive the posterior \([\theta \mid x_{1:n}]\).
  3. (10%) Compute the posterior median of \(\theta\).
  4. (10%) Suggest and compute a point estimator of \(\phi\).
  5. (10%) Under the loss \(L_{q}(\theta, \widehat{\theta}) = (\theta-\widehat{\theta})^2/\theta^q\) for some \(q\geq0\), derive the Bayes estimator \(\widehat{\theta}_{q}\) of \(\theta\).
Easy Difficult    Number of votes: 9
Question 5
 Anonymous Mouse    Created at: 2023-02-24 15:38  1 
For question 5, does it suffice to write the Bayes estimator as a fraction of two expectations E(theta^1-q|x)/E(theta^-q|x), or can it be simplified even more?
Show 1 reply
  [TA] Cheuk Hin (Andy) Cheng    Created at: 2023-02-24 17:36  1 
If the answer cannot be simplified further, then it is acceptable. This reply does not imply whether your answer is correct or not.
question about 1 and 2
 Anonymous Gopher    Created at: 2023-03-01 14:59  0 
May I know whether we should write down the answer after normalizing in question 1 and 2? Or it will be ok to write the answer without normalizing by using the proportionate notation?
Show 2 reply
  [TA] Cheuk Hin (Andy) Cheng    Created at: 2023-03-02 08:52  0 
If the normalizing constant can be easily found, then you should also derive it. Moreover, the derivation may be useful in the next parts.
  [Instructor] Kin Wai (Keith) Chan    Created at: 2023-03-02 19:06  1 
Good question! Indeed, I think it is a good learning moment for all of you.
  • In general, it is NOT necessary to know the proportionality constant $c$ to define the posterior distribution.
  • However, our answer would be more complete if we also know the proportionality constantat least in the following three cases:
    1. the posterior is a named distribution, e.g., $\text{Beta}(\alpha, \beta)$.
    2. the posterior can be represented as a more interpretable distribution after knowing the value of $c$, e.g., the mixture probabilities of a distribution; see Example 2.9 and Question 1.1 of mock 2.
    3. the computation of the posterior is facilitatedafter knowing the value of $c$, e.g., part 6 of A3.
So, we should determine whether deriving $c$ is meaningfulon a case-by-case basis. It requires us to use our wisdom to decide the degree of usefulness.
Question 4
 Anonymous Bison    Created at: 2023-03-01 16:43  1 
Can I ask if we are finding the point estimator of phi, should we use E(phi_head) = phi? Or any other direction recommended ?
Show 2 reply
  [TA] Cheuk Hin (Andy) Cheng    Created at: 2023-03-02 08:56  0 
In this question, you can propose any point estimator that you like. However, you need to provide your reasons why you suggest the point estimator in your answer. For detail, you may refer to section 3.3.2 in the lecture note.
  [Instructor] Kin Wai (Keith) Chan    Created at: 2023-03-02 18:57  0 
It is a good question.
  • The estimand is $\phi = \mathsf{E}(x_1\mid \theta)/y-1$, which is a function of $\theta$. Note that $\phi$ is a random variable.
  • A possible estimator is the Bayes estimator $\widehat{\phi} = \mathsf{E}(\phi \mid x_{1:n})$ when the loss is $L(\theta, \widehat{\phi}) = (\widehat{\phi}-\phi)^2$. In this question, you need to consider a more general loss function, and find its Bayes estiamtor.
  • From the Bayesian perspective, a point estimator $\widehat{\phi}$ is simply a one-number summary of the distribution of $\phi$.
  • Note that if $\widehat{\phi} = \mathsf{E}(\phi \mid x_{1:n})$ then $\mathsf{E}(\widehat{\phi}) = \mathsf{E}(\phi)$, which is the prior expectation of $\phi$. In our case,
    \mathsf{E}(\phi) = \sum_{j=1}^J t_j p_j,
    So, $\mathsf{E}(\phi)$ is a fixed and known constant fully determined by the prior distribution. Hence, there is no need to estimate $\mathsf{E}(\phi)$.
Example 3.13
  [Student helper] Martin Lee    Created at: 0000-00-00 00:00   Chp3Eg13 0 
Example 3.13 (Weighted quadratic loss ${\color{blue}\star}{\color{blue}\star}{\color{gray}\star}$). Consider the loss function: \(L(\theta, \widehat{\theta}) = w(\theta)(\widehat{\theta}- \theta)^2\), where \(w(\cdot)\) is a non-negative weight function. Prove that the Bayes estimator is \[\widehat{\theta}_{\pi} = \frac{\mathsf{E}\left\{ w(\theta) \theta\mid x \right\}}{\mathsf{E}\left\{ w(\theta)\mid x \right\}} .% \overset{\text{if $w(\theta)=1$}}{=} \E(\theta\mid x).\]


  1. (Step 1: find posterior loss) The posterior loss is \[\begin{align} L(\pi, \widehat{\theta}) &=& \mathsf{E}\{ w(\theta)(\widehat{\theta}-\theta)^2 \mid x\}\\ &=& \mathsf{E}\bigg[ w(\theta) \bigg\{(\widehat{\theta}-\widehat{\theta}_{\pi}) - (\theta-\widehat{\theta}_{\pi})\bigg\}^2 \;\bigg\vert\; x \bigg] \\ &=& \mathsf{E}\bigg[ w(\theta) (\widehat{\theta}-\widehat{\theta}_{\pi})^2 \;\bigg\vert\; x \bigg] + \mathsf{E}\bigg[ w(\theta) (\theta-\widehat{\theta}_{\pi})^2 \;\bigg\vert\; x \bigg] - 2(\widehat{\theta}-\widehat{\theta}_{\pi}) \underbrace{\mathsf{E}\bigg[ w(\theta) (\theta-\widehat{\theta}_{\pi}) \;\bigg\vert\; x \bigg]}_{A}, \end{align}\] where \(A = \mathsf{E}\{w(\theta)\theta\mid x\} - \widehat{\theta}_{\pi} \mathsf{E}\{w(\theta) \mid x\} = 0.\)
  2. (Step 2: find Bayes estimator) Hence, we obtain \[\begin{align}\label{eqt:WL2_basyes} L(\pi, \widehat{\theta}) = \underbrace{\mathsf{E}\bigg[ w(\theta) (\widehat{\theta}-\widehat{\theta}_{\pi})^2 \;\bigg\vert\; x \bigg]}_{B(\theta, \widehat{\theta})} + \underbrace{\mathsf{E}\bigg[ w(\theta) (\theta-\widehat{\theta}_{\pi})^2 \;\bigg\vert\; x \bigg]}_{\text{Does not depend on $\widehat{\theta}$}}.\end{align}\] So, \(\widehat{\theta}\mapsto L(\pi, \widehat{\theta})\) is minimized iff \(\widehat{\theta}\mapsto B(\theta, \widehat{\theta})\) is minimized. Clearly, \(B(\theta, \widehat{\theta})\) is minimized (to zero — the smallest possible value) iff \(\widehat{\theta} = \widehat{\theta}_{\pi}\). By Theorem 3.1, \(\widehat{\theta}_{\pi}\) is the Bayes estimator. \(\;\blacksquare\)
Easy Difficult    Number of votes: 1
 Anonymous Mouse    Created at: 2023-02-18 20:51  0 
Can someone please explain why the expression “A” equals 0?
Show 4 reply
 Anonymous Mosquito    Created at: 2023-02-23 23:38  0 
^θπ is defined as E{w(θ)θ∣x}/E{w(θ)∣x} in the question. Plug in ^θπ in A, you will get zero.

 Anonymous Hawk    Created at: 2023-03-03 13:46  0 
Can you explain more clearly?
 Anonymous Hawk    Created at: 2023-03-03 13:47  0 
Does the question really define it? Isn't it just asking us to prove the BE into that form?
  [TA] Di Su    Last Modified: 2023-03-16 17:40  0 
We have defined A to be the term $\mathrm{E}\left[w(\theta)\left(\theta-\hat{\theta}_\pi\right) \mid x\right]$. And we see
$\begin{align*}\mathrm{E}\left[w(\theta)\left(\theta-\hat{\theta}_\pi\right) \mid x\right]
&=\mathrm{E}\left[w(\theta)\left(\theta-\frac{E\{w(\theta) \theta \mid x\}}{E\{w(\theta) \mid x\}}\right) \mid x\right]\\
&=\mathrm{E}\left[w(\theta)\theta\mid x\right]-\mathrm{E}\left[w(\theta)\frac{E\{w(\theta) \theta \mid x\}}{E\{w(\theta) \mid x\}}\mid x\right]\\
&=\mathrm{E}\left[w(\theta)\theta\mid x\right]-E\{w(\theta) \theta \mid x\}\mathrm{E}\left[\frac{w(\theta)}{E\{w(\theta) \mid x\}}\mid x\right]\\
&=\mathrm{E}\left[w(\theta)\theta\mid x\right]-E\{w(\theta) \theta \mid x\}\frac{\mathrm{E}\left[w(\theta)\mid x\right]}{E\{w(\theta) \mid x\}}\\
&=\mathrm{E}\left[w(\theta)\theta\mid x\right]-E\{w(\theta) \theta \mid x\}\\
Exercise 2.1
  [TA] Di Su    Created at: 0000-00-00 00:00   A2Ex1 3 
Related last year's exercise and discussion can be found here.
Example 1 ${\color{red}\star}{\color{black}\star}{\color{black}\star}$ Different types of priors (50$\%$) Suppose that a new virus that causes a pandemic was recently discovered. A new vaccine against the new virus was just developed. The aim of this exercise is to estimate the vaccine’s efficacy \(\theta\), which is defined as \[\begin{aligned} \theta &= \frac{\pi_0 - \pi_1}{\pi_1}, \end{aligned}\] where \(\pi_0\) and \(\pi_1\) are the attack rates of unvaccinated and vaccinated humans, respectively. In a preclinical stage, the vaccine was tested on laboratory animals (not humans). The following data were obtained.
Unvaccinated animals Vaccinated animals
Infected animals 14 3
Uninfected animals 1 17
Now, the vaccine is tested on healthy humans. Suppose there are \(n+m\) people. Among them, \(m\) people are randomly assigned to the control group (\(j=0\)), and the rest to the treatment group \((j=1)\). Denote \[x_{i}^{(j)} = \mathbb{1}(\text{the $i$th person in group $j$ is infected})\] for each \(i\) and \(j\). Let \[[ x_1^{(0)}, \ldots, x_m^{(0)} \mid \pi_0, \pi_1 ] \overset{ \text{iid}}{\sim} \text{Bern}(\pi_0) \qquad \text{and} \qquad [ x_1^{(1)}, \ldots, x_n^{(1)} \mid \pi_0, \pi_1 ] \overset{ \text{iid}}{\sim} \text{Bern}(\pi_1)\] be two independent samples given \(\pi_0, \pi_1\).
  1. (40%) Suggest, with a brief explanation (\(\lesssim\) 20 words each) or mathematical derivation,
    1. a conjugate prior on \((\pi_0, \pi_1)\),
    2. an informative prior on \((\pi_0, \pi_1)\),
    3. a non-informative prior on \((\pi_0, \pi_1)\), and
    4. a weakly-informative prior on \((\pi_0, \pi_1)\).
    Note: you may use other information not provided in the question for parts (b) and (d).
  2. (10%) Emily was a statistician who analyzed the above dataset. After observing the \(m+n\) observations in the clinical stage, she performed a rough analysis and strongly believe that \(\pi_0\geq \pi_1\). So, she uses the following prior \(f(\pi_0, \pi_1) \propto \mathbb{1}(\pi_0\geq \pi_1).\) Comment. (Use \(\lesssim 50\) words.)
  1. You may define a prior for \((\pi_0,\pi_1)\) so that \(\pi_0\) and \(\pi_1\) are independent.
    1. Using the independence of \(\pi_0\) and \(\pi_1\), you can derive the conjugate prior individually.
    2. This part illustrates the power of Bayesian analysis. Data based on humans and animals are likely to be generated from different mechanisms, however, it is still natural to approximate one mechanism by the other. So, you may construct prior based on the animals dataset to reflect your belief. Some possible methods are suggested below:
      • For \(j=0,1\), construct a conjugate prior for \(\pi_j\) with hyper-parameters selected according to their interpretation in Example 2.13 of the lecture note.
      • For \(j=0,1\), define \(\pi_j \sim \text{Beta}(\alpha^{(j)},\beta^{(j)})\), where \(\alpha^{(j)}\) and \(\beta^{(j)}\) are selected so that \[\begin{aligned} \mathsf{E}(\pi_j) = \widehat{\pi}_j + (\text{Bias} ) \qquad \text{and} \qquad {\mathsf{Var}}(\pi_j) = \widehat{\sigma}^2_j \times (\text{Inflation factor}), \end{aligned}\] where \(\widehat{\pi}_j\) is a frequentist estimator of \(\pi_j\), and \(\widehat{\sigma}^2_j\) is a frequentist estimator of \({\mathsf{Var}}(\widehat{\pi}_j)\) as a proxy of \({\mathsf{Var}}(\pi_j)\). If you firmly believe your location estimate, then you may set the bias to be zero and set the inflation factor to one. You may use the fact that if \(\xi \sim \text{Beta}(\alpha, \beta)\) satisfy \(\mathsf{E}(\xi)=\mu\) and \({\mathsf{Var}}(\xi)=\sigma^2\) then \[\alpha = \frac{\mu^2(1-\mu)}{\sigma^2}-\mu \qquad \text{and} \qquad \beta=\alpha\left(\frac{1}{\mu}-1 \right).\]
    3. You may use Jeffereys prior on each parameter.
    4. Is there any hardly denying information on \(\pi_0\) and \(\pi_1\)?
  2. What is a proper way to specify an informative prior?
Easy Difficult    Number of votes: 12
question about the hint
 Anonymous Gopher    Created at: 2023-02-11 17:41  2 
Does the “proper” in hint 2 refer to the proper density? Or it just means “appropreate”?
Show 2 reply
 Anonymous Peacock    Created at: 2023-02-12 10:06  2 
I guess it means 'appropriate', similar to Q2 of Example 2.23 in Chp2's lecture note.

  [Instructor] Kin Wai (Keith) Chan    Created at: 2023-02-13 19:01  2 
Yes! It simply means “appropriate” in the layman's sense.
question about exercise2.1
 Anonymous Gopher    Created at: 2023-02-12 16:26  1 
Thank you peacock! I got the meaning!
I still have one question: can I seperately define the prior for Π0 and Π1, then multiply them to get the answer in all a,b,c,d part of this question?
Show 2 reply
 Anonymous Peacock    Created at: 2023-02-13 10:30  1 
I did it this way because I think that the two groups are completely independent.
  [Instructor] Kin Wai (Keith) Chan    Created at: 2023-02-13 19:05  4 
It is a good question.
  • Yes, you can define them separately and then multiply them to form the prior pdf when you believe that $\pi_0$ and $\pi_1$ are independent a priori.
  • No, you can't if you simply wish to make it simple but are unaware of the meaning behind it.
About "hardly denying information" in hint
 Anonymous Bobolink    Created at: 2023-02-12 18:11  0 
For this question, in my view, the only hardly denying information is just that the probability is in (0,1) which is meaningless. Maybe we should search for other information on the internet?
Show 1 reply
  [Instructor] Kin Wai (Keith) Chan    Created at: 2023-02-13 19:15  2 
Good question again.
  • Different people may have different views on “hardly denying information”.
  • It is completely an open-end question. There is no fixed answer to that.
  • You may search for some numbers, stories, studies, or information that may help you to define such a prior.
 Anonymous Mouse    Last Modified: 2023-02-13 20:12  0 
Show reply
question 2
 Anonymous Warbler    Created at: 2023-02-17 16:31  0 
For Emily's problem, can we consider her prior as weakly informative prior? Since for most of the vaccines created, the attack rate would go down.
Even though Emily has used human data to result in this belief, but it is more like a common belief, which may not contradict the prohibition of using the same dataset to derive both prior and posterior distribution.
Or has the belief already violated the rules? thanks!
Show 1 reply
  [TA] Di Su    Created at: 2023-02-17 19:14  1 
Yes, you may regard it as a weakly informative prior. However, you may want to pay attention that Emily proposed the prior after observing the m+nm+n observations in the clinical stage which is based on the data.
Exercise 2.2
  [TA] Di Su    Created at: 0000-00-00 00:00   A2Ex2 2 
Related last year's exercise and discussion can be found here.
Example 2 ${\color{red}\star}{\color{red}\star}{\color{black}\star}$ Posterior of function of parameters (50$\%$). Consider the problem in Exercise 1. In this exercise, assume the following prior for \((\pi_0, \pi_1)\): \[f(\pi_0, \pi_1) = \mathbb{1}(\pi_0,\pi_1\in(0,1)).\]
  1. (10%) Find the posterior \([\pi_0, \pi_1\mid x_{1:m}^{(0)},x_{1:n}^{(1)}]\).
  2. (10%) Find the posterior \([\theta\mid x_{1:m}^{(0)},x_{1:n}^{(1)}]\). Use representation, or leave your answer in terms of a PDF.
  3. (10%) Is the posterior \([\theta\mid x_{1:m}^{(0)},x_{1:n}^{(1)}]\) conjugate with the prior \(\theta\)? Briefly explain. (Use \(\lesssim 50\) words.)
  4. (10%) Is the posterior \([\theta\mid x_{1:m}^{(0)},x_{1:n}^{(1)}]\) proper? Briefly explain. (Use \(\lesssim 50\) words.)
  5. (10%) Graphically present the prior and posterior PDF s of \(\theta\) if the following dataset is obtained. Briefly comment. (Use \(\lesssim 50\) words.)
    Unvaccinated humans Vaccinated humans
    Infected humans 298 343
    Uninfected humans 202 657
  6. (optionaltag) Try different priors of \((\pi_0, \pi_1)\) and compare the effect on the posterior of \(\theta\). Briefly comment.
  1. Are \(\pi_0\) and \(\pi_1\) independent a priori?
  2. Representation leads a much neater answer.
  3. As long as the prior and posterior admit the same distributional form (not necessarily a named class of distributions), they are regarded as conjugate.
  4. A prior distribution means that it well defines a random variable.
  5. This example shows to the dual representation of Bernoulli data. For example, a sample of \(n=10\) iid Bernoulli RV s can be represented as one of the following forms:
    • the data are: \(0,0,1,1,0,0,0,0,1,0\); or
    • numbers of 1s and 0s are 3 and 7.
    To plot the PDF, you may use of the following methods.
    • (Method 1: algebraic) Derive the analytical form of the PDF. Then plot it as in Assignment 1.
    • (Method 2: representation) Simulate \(N=10^{6}\) replications of the sample based on the representation. Then use kernel density estimator (see Chapter 8 of Stat3005) to estimate the PDF. One example is shown below. To plot the density of \(W = (A-10)/(1+B^2)\), where \(A\sim \text{N}(0,1)\) and \(B\sim \text{Exp}(1)\) are independent, you may use the following (simplified) code to produce the graph. Rcodetag1
Easy Difficult    Number of votes: 16
Question about Q2
 Anonymous Peacock    Created at: 2023-02-13 10:16  1 
I am not sure about whether we should transform the posterior derived in Q1 to theta by Jacobian directly, or we need to transform the sampling distribution and prior to theta separately and then time them? Thanks
Show 1 reply
 Anonymous Peacock    Created at: 2023-02-13 20:24  1 
I think I've found out how to solve by representation. Please just ignore this question:)
Rewriting posterior
 Anonymous Mouse    Created at: 2023-02-13 20:12  1 
Let's say I successfully find f(x0|pi0) and f(x1|pi1). Then, since theta can be written in terms of pi0 and pi1, is f(x0,x1|theta) simply (f(x0|pi0)-f(x1|pi1))/f(x1|pi1), or is this mathematically invalid?
Show 1 reply
  [TA] Di Su    Last Modified: 2023-02-13 23:15  5 
Since $f(x_0,x_1\mid\theta)$ is the density of $x_0,x_1$, it is not the same as how $\theta$ is expressed in terms of $\pi_0$ and $\pi_1$
question about the representation
 Anonymous Gopher    Created at: 2023-02-14 00:38  2 
When I use representation to represent theta, theta are expressed by 4 independent variables following gamma distribution(by the representation of beta distribution)Is that ok to leave these 4 variables as the final answer? Or should I find a way to further simplify the result?
Besides, if I successfully represent theta into other fundamental variables, can I say that theta is already well defined? (Actually, I am confused about the meaning of “well define” in the hint.
Show 1 reply
  [TA] Di Su    Last Modified: 2023-02-14 19:19  5 
You can use Beta random variables in your representation, so there is no need to further represent them by Gamma random variables.
And you are right, a fundamental random variable is already well-defined because we know its PDF and CDF (and the integration of its PDF is finite).
Question about representation
 Anonymous Mosquito    Created at: 2023-02-16 21:25  1 
May I ask if theta is represented by beta rv only? I try to use beta to do the operation but I cannot find a well-defined distribution to represent it. May I ask should I left it there as a product of beta distribution, or actually I should use Jacobian if I can't find the well-defined distribution? For Q3, may I ask what does it mean by same distributional form? For example, if X1 ~ exp(1), X2~exp(2), Z ~N(0,1), does X1+Z has the same distributional form as X2+Z?
Show 1 reply
  [TA] Di Su    Last Modified: 2023-02-17 19:18  0 
For $\theta$, you can simply use Beta r.v.'s to represent it, and you don't need to derive the exact formula of its PDF.
For Q3, your understanding is correct. Having the same distribution form means they come from the same distribution family with possibly different parameters.
Further question about "well defined"
 Anonymous Pinniped    Created at: 2023-02-17 02:08  1 
Thank for the above answar. Fundamental variable is obviously well-defined. But why “if a distribution can be expressed as fundamental random variables, then it is also well-defined (i.e. the integration of pdf is finite)” ? How can we prove this?
Show 1 reply
 Anonymous Swordtail    Last Modified: 2023-02-17 18:11  1 
This is my proof. Please comment if I am wrong.
Note that multiple random variables $X_1, …, X_n$ can be regarded as a single multi-dimensional random variable $X = (X_1, …, X_n)$.
Let $X$ be a well-defined random variable with support $\mathcal{X}$. Let $Z=g(X)$, where $g$ is any function from $\mathcal{X}$ to $\mathbb{R}$.
P(Z≤z) &= P(g(X)≤z) \\
&= \int_{\mathcal{X}} P(g(X)≤z \mid X=x)\ f_X(x) dx \\
&= \int_{\mathcal{X}} P(g(x)≤z)\ f_X(x) dx \\
&= \int_{\mathcal{X}} \mathbb{1}_{\{g(x)≤z\}}\ \ \ f_X(x) dx \\
&≤ \int_{\mathcal{X}} f_X(x) dx \\
&= 1
Therefore, the CDF of $Z$ is within $[0, 1]$, meaning that $Z$ is also well-defined.
Example 1.11
  [Student helper] Martin Lee    Created at: 0000-00-00 00:00   Chp1Eg11 0 
Example 1.11 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Bernoulli-normal (A1 Fall 2019) (A1 Fall 2019) ). Let \(\sigma>0\) be a known constant, and \[\begin{align}& \overset{ \text{iid}}{\sim} & \text{N}(2\theta+1, \sigma^2) ;\\ \theta &\sim& \text{Bern}(1/2).\end{align}\]
  1. Are \(x_{1:n}\) and \(x_{n+1}\) independent? Explain.
  2. Find the posterior distribution \([\theta\mid x_{1:n}]\).
  3. Find the prior predictive distribution \([x_{n+1}]\).
  4. Find the posterior predictive distribution \([x_{n+1}\mid x_{1:n}]\).
  5. Suppose \(\sigma^2=9/10\) and \(x_{1:6} = (2.312,\; 2.351,\; 2.742,\; 2.895,\; 4.574,\; 3.780)^{\text{T}}\). Using R to plot the following prior predictive and posterior predictive distributions on the same graph. Interpret. \[[x_1], \quad [x_2\mid x_1], \quad [x_3\mid x_{1:2}], \quad [x_4 \mid x_{1:3}], \quad [x_5 \mid x_{1:4}], \quad [x_6 \mid x_{1:5}], \quad [x_{7} \mid x_{1:6}].\]
Easy Difficult    Number of votes: 1
I am confused about how was this equation in the answer derived.
 Anonymous Peacock    Created at: 2023-02-17 01:28  0 
Cov(x1, xn+1) = E {Cov(x1, xn+1 | θ)} + Cov {E(x1 | θ), E(xn+1 | θ)}
Show reply
Example 2.2
  [Student helper] Martin Lee    Created at: 0000-00-00 00:00   Chp2Eg2 0 
Example 2.2 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Transforming a location parameter to a scale parameter). Invariant priors for location and scale parameters in Theorem 2.1 do NOT contradict with each other. Indeed, we can derive the less intuitive result (2) by the more trivial result (1). Let \([x\mid \theta] \sim f(\cdot\mid \theta)= f_0(\cdot/\theta)/\theta\) as in (\ref{eqt:scaleFamily}). Then we can represent \(x\) as \[x = \theta z,\] where \(z \sim f_0(\cdot)\). It implies than \[\log x = \log z + \log\theta,\] which can be viewed as a location model for \(y =\log x\) with a location parameter \(\phi = \log\theta\). By Theorem 2.1 (1), the invariant prior of \(\phi\) is \[\begin{align}\label{eqt:prior_phi_in_eg} f_{\phi}(\phi) \propto 1. \tag{2.4}\end{align}\] Using the transformation \(\phi = \log \theta\), we can derive back the PDF of \(\theta\) from (\ref{eqt:prior_phi_in_eg}). Since \(\text{d}\phi/\text{d}\theta = 1/\theta\), we have \[f_{\theta}(\theta) \;\propto\; \frac{1}{\theta} f_{\phi}(\log\theta) \;\propto\; \frac{1}{\theta},\] which gives us back part (2) of Theorem 2.1.\(\;\blacksquare\)
Easy Difficult    Number of votes: 0
how to represent x
 Anonymous Warbler    Created at: 2023-02-16 16:06  0 
I still cannot fully understand why 𝑥=𝜃𝑧
z~f0(·), [x|𝜃]~f0(·/𝜃)/𝜃
if x=𝜃z, then x~𝜃f0(·)? But how to interpret [x|𝜃]~f0(·/𝜃)/𝜃?

Show reply

Apply tag, filter or search to load more result