Exercise 4.1 [OLD] | |
 [TA] Di Su   Created at: 0000-00-00 00:00   2022A4Ex1 | 4 |
Exercise 1 (${\color{red}\star}{\color{red}\star}{\color{gray}\star}$ - Testing and region estimation (60%)).
Let \[\begin{aligned}
& [ x_{1:n}\mid\theta]\overset{ \text{iid}}{\sim} \text{N}(\theta, \theta^2) ,
\\
& \theta \sim \theta_0\text{Exp}(1),\end{aligned}\]
where \(\theta_0=0.5\). Suppose the
dataset A4Q1.csv
is observed. The goal is to perform inference on \(\theta\).
Remark 1 (Hints for Exercise 2.).
| |
4.1 | |
Easy Difficult Number of votes: 29 | |
Question 1 | |
Anonymous Orangutan   Last Modified: 2022-03-21 10:21 | 16 |
I have encountered the difficulty since this is the first time to see exp-normal model with N($\theta, \theta^2$). (I have learned the case where mean known and var unknown or mean unknown var:known) There is a term ($\frac{1}{\sqrt{2\pi\theta^2}}$), so $\theta^{-n}$ is included in posterior . Also, exp(-2$\theta$) $\times$ exp($\frac{\sum(x_i-\theta)^2}{2\theta^2}$) can not be expressed like $e^{-\theta\beta}$ where $\beta$ is free of $\theta$ Thus, I think posterior is not represented as a commonly used named distribution. (no longer exp, Gamma … ) Then, I cannot understand why we can use and how to use dexp ( ) in this question. | |
Show 3 reply | |
Anonymous Giraffe   Created at: 2022-03-22 11:07 | 3 |
I have the same question. I understood that the prior is an exponential distribution and the sampling distribution is a normal distribution, so the posterior is a multiplication of them. If so, are we expected to use dexp() for the exponential part and multiply it by some extra terms? | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2022-03-22 11:22 | 18 |
| |
Anonymous Hedgehog   Created at: 2022-03-31 17:42 | 2 |
In my opinion, the log pdf of the prior $ \theta_0 Exp(1) $ is a simple expression, it seems that the dexp( ) function didn't really help the coding. | |
dexp() | |
Anonymous Bat   Last Modified: 2022-03-26 11:49 | 3 |
I'm confused by the function of “log=TRUE or FALSE” in dexp(). I rewrote the posterior to split it into several parts that can be respectively coded using dexp(). When I try to adjust the shape of posterior, it seems that there are some “log=TRUE” and some “log=FALSE”, and they do influence the shape. But I can't tell why…could you please give me some hints on this? | |
Show 10 reply | |
Anonymous Hyena   Created at: 2022-03-27 13:04 | 1 |
Why you write dexp(t^(-2),rate = ½, log = T) and dexp(t^(-2), rate = 1, log = T) seperately? Also,how can you obtain dexp(log(t),rate = n*sum(x^2), log = T)? As I only has a similar term which is dexp(log(t),rate = n, log = T), without the term sum(x^2) In my posterior distribution, I change θ-n into e-n log θ, I can't understand why you add sum(x^2), would you mind providing an explanation T^T Thank you! | |
Anonymous Jackal   Created at: 2022-03-27 15:41 | 3 |
dude ain't the log parameter means to give out log-density or just a normal density | |
Anonymous Dolphin   Created at: 2022-03-27 16:11 | 1 |
Details of dexp() could be found in this website: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Exponential.html log=TRUE or FAlSE is to decide whether the y-axis = p or log(p) | |
Anonymous Moose   Last Modified: 2022-03-28 11:59 | 1 |
We have to be careful that “log” and “log.p” are different from the dexp() and the pexp()/qexp() function. In terms of dexp(), “log=TRUE” will return the log density. Remind that the trick for computing the density is to take log. Therefore, setting “log=TRUE” will be helpful for this trick. | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2022-03-28 00:12 | 18 |
The R code dexp(...) is a simple way to define the PDF of an exponential RV. There are two options $\texttt{log=TRUE}$ and $\texttt{log=FALSE}$:
\begin{align*} \log f(\theta \mid x_{1:n}) &= C + \log f(\theta) + \log f(x_{1:n}\mid \theta) \\ &=C + \texttt{dexp}(\theta, \texttt{rate} = … , \texttt{log}=\texttt{TRUE}) + \log \left( \prod_{i=1}^n ... \right), \end{align*} where $C$ is a constant that is free of $\theta$. | |
Anonymous Ibex   Created at: 2022-03-31 02:39 | 0 |
For coding the posterior, I first code for the log_post as usual in asm3. log_post = (-n/2)*log(2*pi) + dexp(theta, rate = 2, log = TRUE) - n*log(theta) - sum(x^2)/(2*theta^2) + sum(x)/theta - n/2 Just like the format given by Keith: C + dexp(…) + (product sign of …) Then, I try to shift their values by the following code. New_log_post = log_post - min(log_post) But I found out that their values are still very large that I cannot take exp() to return the adj_post. Is my approach correct or not? Or I got some error in the log_post? (My theta is a sequence from 0 to 5) | |
Anonymous Ibex   Created at: 2022-03-31 02:39 | 0 |
For coding the posterior, I first code for the log_post as usual in asm3. log_post = (-n/2)*log(2*pi) + dexp(theta, rate = 2, log = TRUE) - n*log(theta) - sum(x^2)/(2*theta^2) + sum(x)/theta - n/2 Just like the format given by Keith: C + dexp(…) + (product sign of …) Then, I try to shift their values by the following code. New_log_post = log_post - min(log_post) But I found out that their values are still very large that I cannot take exp() to return the adj_post. Is my approach correct or not? Or I got some error in the log_post? (My theta is a sequence from 0 to 5) | |
Anonymous Orangutan   Created at: 2022-03-31 07:42 | 5 |
I think you can follow these steps. 1. ignore (-n/2)*log(2*pi) & -n/2 2. use -max(log) instead of min(log): aim is not to allow posterior to take large number while allow to take small number 3. change the scope of $\theta$. like [1,1.5] ← you may find by eyeball test. I hope you will solve the problem | |
Anonymous Ibex   Created at: 2022-03-31 14:12 | 1 |
Thanks for your help!!! I solved the problem now! | |
Anonymous Ifrit   Last Modified: 2022-04-01 17:15 | 0 |
I agree. Perhaps we could write the posterior density in terms of exp(…) directly and then see if it is computable. | |
Q1 | |
Anonymous Kraken   Created at: 2022-03-28 04:32 | 2 |
Do we have to find the normalising constant before plotting the posterior density function? | |
Show 1 reply | |
 [TA] Di Su   Created at: 2022-03-29 10:08 | 2 |
Yes, you need to find the normalizing constant numerically so that you can plot the posterior density. | |
Q4 | |
Anonymous Kraken   Created at: 2022-03-28 06:16 | 7 |
I obtained an estimate of alpha that is equal to 1 (100%). I wonder if that is a reasonable answer? Based on my computations, the posterior probability density seems to be concentrated between 1.15 to 1.25, so the interval from 1.1 to 1.3 seems to be too wide. | |
Show 5 reply | |
Anonymous Kraken   Created at: 2022-03-28 06:19 | 2 |
The estimate of alpha I obtained is 99.999999770922776% (approximately equal to 1) | |
Anonymous Orangutan   Created at: 2022-03-28 09:29 | 2 |
How about q5? You also get the small probability p($\theta \in \Theta_0$ | x ) ?? I got 0.08 (8%) and confused this result. | |
 [TA] Di Su   Created at: 2022-03-29 10:13 | 7 |
Yes, your result of $\hat{\alpha}$ is reasonable. It is a good observation that the interval $(1.1,1.3)$ is “too long”. So to make things fun, you may try shorter intervals and see what will happen. | |
 [TA] Di Su   Created at: 2022-03-29 12:21 | 10 |
Regarding Q5, you may found the large “pseudo-size” $\alpha_1/(\alpha_0+\alpha_1)=99\%$ is confusing. You can refer to Remark 4.1 in the lecture notes for more explanation. Roughly speaking, $\alpha_1/(\alpha_0+\alpha_1)$ is manually chosen, it reflects our attitudes toward type-I and type-II error and is subjective in this sense. When $\alpha_1/(\alpha_0+\alpha_1)=99\%$, we are harsh on $H_0$, even a slight deviation from $H_0$ will make us reject. But $\hat{p}_0$ is given by the model and the data, which is more objective. Hence a small $\widehat{p}_0$ and a large $\alpha_1/(\alpha_0+\alpha_1)$ roughly means that, under our (subjective) criteria, the (objective) data does not support $H_0$. Here we found the subjective criteria and the objective data confusing, hence it teaches us to select a reasonable loss that is not too subjective. But in practice, we may want a loss that favors $H_1$. For example, when testing $H_0$: some new drug is harmless, v.s. $H_1$: the new drug is harmful, we'd rather reject using harmless drugs than putting harmful drugs into use. | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2022-03-30 11:08 | 4 |
It is possible to have a confidence estimator $\widehat{\alpha}\approx 100\%$ or even $\widehat{\alpha}\approx 0\%$. Operationally, there is no problem. It does not violate any requirements. | |
Q4 | |
Anonymous Orangutan   Last Modified: 2022-03-28 09:04 | 1 |
I have a question of q4, since our posterior is highly concentrated on $\theta$ = 1.2 . Therefore, $\hat{\alpha_\pi}$ = p($\theta \in $[1.1,1.3] | x ) = 1 by using R. Is this a possible answer in practice ? ※ I'm sorry for asking same, I though I replied the thread. I don't know how to delete… | |
Show 1 reply | |
 [TA] Di Su   Created at: 2022-03-29 10:20 | 2 |
It is possible, but there are tradeoffs. As an extreme example, you may consider $I=[-\infty,\infty]$. The confidence surely is 1, but this interval doesn't provide us any information. | |
Q 6 Modified prior. | |
Anonymous Orangutan   Created at: 2022-03-28 10:38 | 3 |
In the example 4.9, $\pi_1$(x) = dnorm(x,0,$\sqrt{\tau^2+\sigma^2}$) because of x~N($\theta, \sigma^2$). However, in this question, x~ N($\theta, \theta^2$) , and when we set $\pi_1$($\theta$) ~ N(1,$\tau^2$), $\pi_1$(x) = dnorm(x,1,$\sqrt{\tau^2+\theta^2}$). ※ it is not integrated out $\theta$ and can't be used. Then, we have to compute $\pi_1$(x) by direct computation ? That is to say, Here, My question is that "what is the space of $\Theta_1$ ? is it ok to include $\theta$ = 1.2. In addition, is it possible to compute this tedious equation? Thanks | |
Show 1 reply | |
 [TA] Di Su   Last Modified: 2022-03-29 10:37 | 9 |
You are correct that we cannot directly use the formula in Example 4.9, because in this example the prior and the posterior are not conjugate. Here are some hints:
| |
Q6 prior | |
Anonymous Hyena   Last Modified: 2022-04-01 09:53 | 3 |
Since it is difficult to find π1(x), so we need to use Riemann Sum. However, if I set θ to follow unbounded distribution, such as N(1,1), where the range is -inf to +inf, it will be impossible for us to calculate Riemann Sum. So, is it better for us to set the prior to be bounded distribution, such as truncated distribution? Thank you^^ | |
Show 4 reply | |
 [TA] Di Su   Last Modified: 2022-03-30 10:27 | 3 |
In $\texttt{R}$ you can use an interval $I$ with bounded boundaries such that $Pr(\theta\in I\mid x_{1:n})\approx1$. | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2022-03-30 11:05 | 10 |
Yes, you are right. You may find my comments below useful:
| |
Anonymous Llama   Created at: 2022-03-30 21:48 | 5 |
As mentioned by TAs and professor, I think the main techniques we need to learn from this assignment is how to deal with the posterior or integral computation of these unnamed distribution or distribution with unbounded support. In my view, what they have in common is that their density can be explained or summarized by a certain range of parameter, the rest density will be so small that we can ignore them. Our job is just to find a appropriate range by trails and errors. In addition, what we learned in A3 is also important, sometimes, we should take a log reasonably to avoid some computation limitation raised by R when the value involved is too small, but in fact , the target value is not too small to be showed, and the log version is computable in R. | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2022-04-01 10:05 | 3 |
Yes, you are right! Indeed, the whole STAT4010 is a combination of three things:
| |
Q1 | |
Anonymous Jackal   Last Modified: 2022-03-30 13:12 | 0 |
Imma ask sth stupid. Are we plotting posterior against theta or against x? I am confused that theta could be anything in an indefinite range (but with different probability) | |
Show 1 reply | |
Anonymous Manatee   Created at: 2022-03-30 15:42 | 2 |
You may do some trials to find a proper range for theta. Posterior is against theta. | |
Q2 | |
Anonymous Loris   Created at: 2022-03-30 15:17 | 0 |
For Q2, do we need to derive the analytical form of MAP? Or we can directly calculate the mode of posterior density in R? | |
Show 1 reply | |
 [TA] Di Su   Created at: 2022-03-30 16:10 | 0 |
You only need to numerically find the MAP. | |
About Q5 and Q6 | |
Anonymous Hyena   Created at: 2022-03-30 21:19 | 2 |
If we obtain different result from Q5 and Q6, for example, we reject H0 in Q5 but we do not have enough evidence to reject H0 in Q6. It seems contradicting. Why this can happen? Thank you^^ | |
Show 2 reply | |
 [TA] Di Su   Created at: 2022-03-31 12:59 | 6 |
It is a good question! There are several reasons:
| |
 [Instructor] Kin Wai (Keith) Chan   Last Modified: 2022-03-31 20:42 | 6 |
Di's answer is very clear. I would like to add one more point. Bayes factor (BF) is a coherent measure in the following sense. Consider three hypothesis: $H_j:\theta\in\Theta_j$ for $j=1,2,3$. Let $B_{ij}$ be the BF of $H_i$ against $H_j$. Then we have, e.g., \[ B_{13} = B_{12}B_{23}. \] | |