Exercise 4.1 | |
 [TA] Di Su   Created at: 0000-00-00 00:00   A4Ex4.1 | 3 |
Related last year's exercise and discussion can be found here. Exercise 1 (Testing and region estimation (60%)). Let \[\begin{aligned} & \overset{ \text{iid}}{\sim} \text{N}(\theta, \theta^2) , \\ \theta &\sim \theta_0\text{Exp}(1),\end{aligned}\] where \(\theta_0=0.5\). Suppose the dataset A4Q1.csv
is observed. The goal is to perform inference on \(\theta\).
| |
3.7 | |
Easy Difficult Number of votes: 3 | |
question 1 | |
Anonymous Gopher   Created at: 2023-03-24 14:52 | 0 |
In the first question, I fail to find a named distribution of the posterior density, can I just leave the kernel as the final answer? Thank you! | |
Show reply | |
About Chapter 3 | |
 [Developer] Kai Pan (Ben) Chu   Created at: 0000-00-00 00:00   Chp3General | 0 |
Question / Discussion that is related to Chapter 3 should be placed here. | |
Easy Difficult Number of votes: 0 | |
Tutorial 4 Example 2.2 | |
Anonymous Orangutan   Created at: 2022-02-17 17:39 | 1 |
I have a problem to understand the step of standardization. why $E(\theta_j|y_{1:n}) = \mu_j -\sigma_j \frac{dnorm(u_j)-dnorm(l_j)}{pnorm(u_j)-pnorm(l_j)}$ ? 1 why we cannnot simply use $\mu_j$ 2 why this term is used for the standardized process? (u_j and l_j aren't understandable) $\frac{dnorm(u_j)-dnorm(l_j)}{pnorm(u_j)-pnorm(l_j)}$ Sorry for my abstract question. Thanks | |
Show 3 reply | |
 [TA] Di Su   Last Modified: 2022-02-18 17:48 | 2 |
Thanks for your questions. It comes from Tutorial 4 Example 2.2 Q2. (Please kindly mention which question you are referring to so that others can also take a look :) )
| |
Anonymous Monkey   Created at: 2022-03-11 13:48 | 1 |
So how can we arrive at that equation? Not quite sure how to deal with it to get the pnorm formula. | |
 [TA] Di Su   Created at: 2022-03-23 21:50 | 3 |
Let's use TN$(\mu,\sigma^2;a,b)$ to denote the truncated normal distribution that restricts $N(\mu,\sigma^2)$ on $[a,b]$. Assume $X\sim$ TN$(\mu,\sigma^2;a,b)$. Then $f(x)\propto\phi(x;\mu,\sigma^2),x\in[a,b]$ where $\phi(\cdot;\mu,\sigma^2)$ denotes the density of a normal random variable with mean $\mu$ and variance $\sigma^2$. We have the following observations:
| |
Questions Regarding Tutorial 5 | |
Anonymous Grizzly   Last Modified: 2022-02-27 22:04 | 1 |
$\textbf{Proof of Theorem 1.3}$ $$\text{R}(\pi,\widehat{\theta}_\pi)=\int_\mathscr{X}\int_\Theta \left[\widehat{\theta}_\pi-\theta\right]^2 dF(\theta\mid x)dF(x)$$ Why can we write $dF(\theta\mid x) dF(x)$? Is it same as $dF(x\mid\theta)dF(\theta)$? $\textbf{Example 1.1}$ $$\widehat{\theta}_\pi=\text{E}[\theta\mid x_{1:n}]=\frac{\tau_0^2}{\sigma^2+\tau_0^2}\overline{x}+\frac{\sigma^2}{\sigma^2+\tau_0^2}\nu_0$$ Why is n missing in the above expression? $\textbf{Example 1.2}$ $$\rho(a,b)\geq\rho\left(0,-b/(a-1)\right) \text{for a<0}$$ $$\therefore a\overline{x}+b\quad\text{is uniformly dominated by −b/(a−1) for a<0.}$$ If we have stated a=0 in $\rho\left(0,-b/(a-1)\right)$, is -b/(a-1)=-b/(0-1)=b? Therefore, $a\overline{x}+b\quad\text{is uniformly dominated by b for a<0.}$ Is that correct? Thank you very much. | |
Show 1 reply | |
 [TA] Cheuk Hin (Andy) Cheng   Created at: 2022-02-27 23:20 | 3 |
Thanks for the questions.
| |
Mock 2021 spring Q3 | |
Anonymous Orangutan   Created at: 2022-03-08 19:04 | 0 |
How can we derive A2 and A3? Although I can prove this representation is correct by looking at answer, I can't come up with this idea. | |
Show 2 reply | |
Anonymous Orangutan   Created at: 2022-03-09 18:30 | 1 |
I asked in tutorial and tried, but still couldn't get the same form. Please explain again or write down the procedure. Thanks | |
 [TA] Di Su   Last Modified: 2022-03-09 22:04 | 1 |
Consider the two steps:
| |
Q3 spring 2021 spring | |
Anonymous Orangutan   Created at: 2022-03-09 09:54 | 1 |
I tried to simulate the answer, but couldn't get the same answer. 1 Where are my code mistakes ? 2 some researchers find $\theta$ is always positive. Then, Is new prior 1(0<$\theta_0$ < $\theta_1$ ) reasonable to mitigate strong previous prior ?Is
| |
Show 1 reply | |
 [TA] Di Su   Last Modified: 2022-03-23 22:33 | 1 |
2021 Spring Mock Midterm Q3.3 The for loop of $\texttt{i.t1}$ is not correct. It violates the requirement that $\theta_0<\theta_1$. You may modify that loop as (There are also other ways to write the for loop.)It is a good practice to try the codes yourself, keep up! | |
2019 M0 BE | |
Anonymous Orangutan   Created at: 2022-03-10 07:52 | 0 |
I am not sure why IV is correct. Bayes estimators in this question are 2,3 and 6? | |
Show 1 reply | |
 [TA] Di Su   Last Modified: 2022-03-10 11:42 | 1 |
Yes.
| |
Cramér–Rao bound, Sufficiency and Completeness | |
Anonymous Moose   Created at: 2022-03-28 15:46 | 1 |
In STAT4003, we have learnt Cramér–Rao bound, Sufficiency and Completeness to assess the quality of an estimator or in general a statistic. But in this course we mainly focus on the decision theory. How do those concepts relate to each other? In particular, I found that the concept of completeness is really confusing. Why do we need such definition in practice? | |
Show 1 reply | |
 [TA] Cheuk Hin (Andy) Cheng   Last Modified: 2022-03-29 00:13 | 2 |
This question is more likely from S4003. Since it is out of syllabus, I answer it in short. First of all, completeness is a property of a statistic in relation to a family of probability distributions. It roughly says that no non-trivial function of statistic have mean 0 unless the function is 0. Say our parameter of interest is $\theta$. One motivation of considering the complete sufficient statistic (CSS) is the fact that minimum sufficient statistic (MSS) (the simplest form of sufficient statistic SS) is not necessarily independent to ancillary statistic (AS, a statistic that contain no information of $\theta$). This is counter intuitive because we would expect sufficient statistic contains “all” information about $\theta$ and thus independent to ancillary statistic. Indeed we need stronger condition for MSS so that it would be independent to AS. So, the completeness stands out (take a look on the Basu theorem). Note also CSS implies MSS under very mild assumption. Some people also view completeness as a tool or a stronger assumption for proving stronger result for the (sufficient) statistic they are interested. I think completeness can be related to the decision theory. Recall that under the framework, Frequentist makes decision that minimizes the risk. One usage of the completeness together with the decision theory would be the Lehmann-Scheffe theorem. If $T(x)$ is CSS and the unbiased statistic $g(x)$ depends on the data only through $T(x)$, then $g(x)$ is UMRUE. Here, we have used the decision theory in the following sense: we specify our utility or loss, then we make decision that minimizes the expected loss or risk (in above case the convex loss with a very large penalty on bias). The Lehmann-Scheffe theorem is also an example that why we need to care about CSS in practice, since it tells us the “best” point estimator. | |
About Invariance | |
Anonymous Bobolink   Created at: 2023-02-26 00:37 | 0 |
We learned the definition of invariance and talked about how to make an invariant prior. It always means the prior still be the same under reparametrization. But till now, we haven't ever used reparametrization and I don't know in which situations we should use the reparametrization. So could I ask what're the practical benefits of the invariance? | |
Show reply | |
Tutorial 5 Example 2.1 | |
Anonymous Dolphin   Created at: 2023-03-20 21:23 | 0 |
To prove bayes estimator is minimax. Can I prove by claiming the Frequentists Risk is constant over all theta? Given a prior and the corresponding Bayesian Estimator The assumption of Bayes Risk is always larger than or equal to Frequentist Risk for any theta is suggesting Frequentist Risk is constant Because Bayes Risk is average of Frequentist Risk with respect to theta If the average is always large than or equal to all possible outcome of the Random Variable, then the Random Variable has to be constant | |
Show reply | |
Exercise 3.2 | |
 [TA] Cheuk Hin (Andy) Cheng   Created at: 0000-00-00 00:00   A3Ex3.2 | 0 |
Let \([x \mid \theta]\) follow any sampling distribution with any prior for \(\theta\). Denote \(\phi_j = g_j(\theta)\) for \(j=1, \ldots, J\), where \(g_1, \ldots, g_J\) are some known functions and \(J\in\mathbb{N}\) is fixed. Suppose that, under the loss function \(L(\theta, \widehat{\theta}) = (\widehat{\theta} - \theta )^2/\theta^2 ,\) the Bayes estimator of \(\phi_j\) is \(\widehat{\phi}_j\) for each \(j\). The parameter of interest is \[\phi := \sum_{j=1}^J w_j \phi_j, \] where \(w_1, \ldots, w_J\in\mathbb{R}\) are known and fixed. Derive the Bayes estimator of \(\phi\). Hints: See Remark 2.2. Don’t read the hints unless you have no ideas and have tried for more than 15 mins. | |
5 | |
Easy Difficult Number of votes: 2 | |
Loss function | |
Anonymous Pinniped   Last Modified: 2023-03-03 01:44 | 1 |
May I ask whether $L(\Phi,\hat{\Phi}) = (\hat{\Phi} - \Phi)^2/\Phi^2$ ? Thank you. | |
Show 1 reply | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2023-03-03 10:57 | 0 |
Good catch. You are right. For any function $h$, the loss for estimating $\xi = h(\theta)$ by $\widehat{\xi}$ is \[ L(\theta, \widehat{\xi}) = (\widehat{\xi}-\xi)^2/\xi^2. \] Note that I always put the full model parameter $\theta$ in the first argument to emphasize the dependence on $\theta$. | |
Exercise 3.1 (P6-10) | |
 [TA] Cheuk Hin (Andy) Cheng   Created at: 0000-00-00 00:00   A3Ex3.1 | 0 |
Consider an online platform that sells secondhand goods. Let \(y\) be the known official price of a brand new product ABC. On the platform, there are \(n\) secondhand items of product ABC having a similar condition (e.g., “like new”). Denote their prices by \(x_1, \ldots, x_n\). Assume the following model: \[\begin{aligned} \left[ x_1, \ldots, x_n \mid \theta \right] &\overset{\text{IID}}{\sim} Ga(v_0)/\theta, \end{aligned}\] where \(v_0>0\) is fixed. Assume that we have a prior belief that \(\theta\) can only take values \(t_1, \ldots, t_J\) with probability \(p_1, \ldots, p_J\), where \(t_1, \ldots, t_J, p_1, \ldots, p_J>0\) and \(J>1\) are fixed numbers such that \(p_1 + \cdots + p_J = 1\). Set \(v_0 = 1.4\) and \(J=10\). For each \(j=1, \ldots, J\), let \[t_j = \frac{j}{1000 J} \qquad \text{and} \qquad p_j \propto \left\{ 0.8 - \left\vert \frac{j}{J}-0.8 \right\vert \right\}, \qquad j=1, \ldots, J. \] Suppose that \(y=28888\) and a dataset (A3.txt ) of size \(n=80\) is observed. We are interested in \[\phi = \frac{E(x_1\mid \theta)-y}{y}. %\vspace{-0.1cm}\]6. (10%) Write an R function BE(x, q=0, t, p, v0) to compute \(\widehat{\theta}_{q}\), where the inputs are as follows:x — a vector storing \((x_1, \ldots, x_n)\).q — the order \(q\) of the loss function. By default, set q=0 .t — a vector storing \((t_1, \ldots, t_J)\).p — a vector storing \((p_1, \ldots, p_J)\).v0 — the value of the hyperparameter \(v_0\).7. (10%) In parts (7)–(9), consider the loss \(L_1(\theta, \widehat{\theta})\). Let \(\mathcal{S} = \{\widehat{\theta}_{0},\widehat{\theta}_{1},\ldots, \widehat{\theta}_{4}\}\) be the set of estimators of interest. Use simulation to plot the risk functions of \(\widehat{\theta}_{0},\widehat{\theta}_{1},\ldots, \widehat{\theta}_{4}\) against \(\theta\) on the same graph. 8. (10%) Using the graph in part (7), determine whether the estimators in \(\mathcal{S}\) are admissible. 9. (10%) Using the graph in part (7), find the minimax estimator of \(\theta\) among \(\mathcal{S}\). 10. (10%) The above prior is discrete with support \(\{t_1, \ldots, t_J\}\). Andy claimed that it is not reasonable. Instead, he modified the prior so that \(\theta\) follows \(t_j Ga(\tau_0)/\tau_0\) with probability \(p_j\) for each \(j=1, \ldots, J\), where \(\tau_0>0\) is fixed. Select an appropriate value of the hyperparameter \(\tau_0\). Using this prior, suggest an estimator of \(\phi\). Do you prefer this estimator or the estimator in part (4)? Why? (Use \(\lesssim 50\) words.) Hints: See Remark 2.1. Don’t read the hints unless you have no ideas and have tried for more than 15 mins. | |
4.5 | |
Easy Difficult Number of votes: 6 | |
Question 6 | |
Anonymous Bison   Created at: 2023-03-02 00:16 | 1 |
I would like to ask for question 6, according to the hint about adding adjustment to the log(), do we need to adjust back. e.g., exp(log(theta) - min(theta)), how can we extract the exp(min(theta)) out? or is it negligible when we have the fraction of a expectation over expectation? | |
Show 2 reply | |
 [TA] Cheuk Hin (Andy) Cheng   Created at: 2023-03-02 09:11 | 1 |
Observe that $\max_{\theta} p_0(\theta)$ is a constant free of $\theta$. Moreover, $b = exp(log(b))$. Therefore, the posterior proportional to $$f(\theta\mid X) \propto \exp\{\log(p_0(\theta)) - \log(\max_{\theta} p_0(\theta)) \}. $$ Therefore, the extra adjustment term can be absorbed into the normalizing constant when we compute the posterior probabilities. | |
 [Instructor] Kin Wai (Keith) Chan   Last Modified: 2023-03-02 13:04 | 2 |
It is a good question. Let me explain in the following way:
| |
Exercise 3.1 (P1-5) | |
 [TA] Cheuk Hin (Andy) Cheng   Created at: 0000-00-00 00:00   A3Ex3.1 | 1 |
Consider an online platform that sells secondhand goods. Let \(y\) be the known official price of a brand new product ABC. On the platform, there are \(n\) secondhand items of product ABC having a similar condition (e.g., “like new”). Denote their prices by \(x_1, \ldots, x_n\). Assume the following model: \[\begin{aligned} \left[ x_1, \ldots, x_n \mid \theta \right] &\overset{\text{IID}}{\sim} Ga(v_0)/\theta, \end{aligned}\] where \(v_0>0\) is fixed. Assume that we have a prior belief that \(\theta\) can only take values \(t_1, \ldots, t_J\) with probability \(p_1, \ldots, p_J\), where \(t_1, \ldots, t_J, p_1, \ldots, p_J>0\) and \(J>1\) are fixed numbers such that \(p_1 + \cdots + p_J = 1\). Set \(v_0 = 1.4\) and \(J=10\). For each \(j=1, \ldots, J\), let \[t_j = \frac{j}{1000 J} \qquad \text{and} \qquad p_j \propto \left\{ 0.8 - \left\vert \frac{j}{J}-0.8 \right\vert \right\}, \qquad j=1, \ldots, J. \] Suppose that \(y=28888\) and a dataset (A3.txt ) of size \(n=80\) is observed. We are interested in \[\phi = \frac{E(x_1\mid \theta)-y}{y}. %\vspace{-0.1cm}\]
| |
4.1 | |
Easy Difficult Number of votes: 9 | |
Question 5 | |
Anonymous Mouse   Created at: 2023-02-24 15:38 | 1 |
For question 5, does it suffice to write the Bayes estimator as a fraction of two expectations E(theta^1-q|x)/E(theta^-q|x), or can it be simplified even more? | |
Show 1 reply | |
 [TA] Cheuk Hin (Andy) Cheng   Created at: 2023-02-24 17:36 | 1 |
If the answer cannot be simplified further, then it is acceptable. This reply does not imply whether your answer is correct or not. | |
question about 1 and 2 | |
Anonymous Gopher   Created at: 2023-03-01 14:59 | 0 |
May I know whether we should write down the answer after normalizing in question 1 and 2? Or it will be ok to write the answer without normalizing by using the proportionate notation? | |
Show 2 reply | |
 [TA] Cheuk Hin (Andy) Cheng   Created at: 2023-03-02 08:52 | 0 |
If the normalizing constant can be easily found, then you should also derive it. Moreover, the derivation may be useful in the next parts. | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2023-03-02 19:06 | 1 |
Good question! Indeed, I think it is a good learning moment for all of you.
| |
Question 4 | |
Anonymous Bison   Created at: 2023-03-01 16:43 | 1 |
Can I ask if we are finding the point estimator of phi, should we use E(phi_head) = phi? Or any other direction recommended ? | |
Show 2 reply | |
 [TA] Cheuk Hin (Andy) Cheng   Created at: 2023-03-02 08:56 | 0 |
In this question, you can propose any point estimator that you like. However, you need to provide your reasons why you suggest the point estimator in your answer. For detail, you may refer to section 3.3.2 in the lecture note. | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2023-03-02 18:57 | 0 |
It is a good question.
| |
Example 3.13 | |
 [Student helper] Martin Lee   Created at: 0000-00-00 00:00   Chp3Eg13 | 0 |
Example 3.13 (Weighted quadratic loss ${\color{blue}\star}{\color{blue}\star}{\color{gray}\star}$). Consider the loss function: \(L(\theta, \widehat{\theta}) = w(\theta)(\widehat{\theta}- \theta)^2\), where \(w(\cdot)\) is a non-negative weight function. Prove that the Bayes estimator is \[\widehat{\theta}_{\pi} = \frac{\mathsf{E}\left\{ w(\theta) \theta\mid x \right\}}{\mathsf{E}\left\{ w(\theta)\mid x \right\}} .% \overset{\text{if $w(\theta)=1$}}{=} \E(\theta\mid x).\] $\bigstar~$Solution:
| |
4 | |
Easy Difficult Number of votes: 1 | |
Question | |
Anonymous Mouse   Created at: 2023-02-18 20:51 | 0 |
Can someone please explain why the expression “A” equals 0? | |
Show 4 reply | |
Anonymous Mosquito   Created at: 2023-02-23 23:38 | 0 |
^θπ is defined as E{w(θ)θ∣x}/E{w(θ)∣x} in the question. Plug in ^θπ in A, you will get zero. | |
Anonymous Hawk   Created at: 2023-03-03 13:46 | 0 |
Can you explain more clearly? | |
Anonymous Hawk   Created at: 2023-03-03 13:47 | 0 |
Does the question really define it? Isn't it just asking us to prove the BE into that form? | |
 [TA] Di Su   Last Modified: 2023-03-16 17:40 | 0 |
We have defined A to be the term $\mathrm{E}\left[w(\theta)\left(\theta-\hat{\theta}_\pi\right) \mid x\right]$. And we see $\begin{align*}\mathrm{E}\left[w(\theta)\left(\theta-\hat{\theta}_\pi\right) \mid x\right] &=\mathrm{E}\left[w(\theta)\left(\theta-\frac{E\{w(\theta) \theta \mid x\}}{E\{w(\theta) \mid x\}}\right) \mid x\right]\\ &=\mathrm{E}\left[w(\theta)\theta\mid x\right]-\mathrm{E}\left[w(\theta)\frac{E\{w(\theta) \theta \mid x\}}{E\{w(\theta) \mid x\}}\mid x\right]\\ &=\mathrm{E}\left[w(\theta)\theta\mid x\right]-E\{w(\theta) \theta \mid x\}\mathrm{E}\left[\frac{w(\theta)}{E\{w(\theta) \mid x\}}\mid x\right]\\ &=\mathrm{E}\left[w(\theta)\theta\mid x\right]-E\{w(\theta) \theta \mid x\}\frac{\mathrm{E}\left[w(\theta)\mid x\right]}{E\{w(\theta) \mid x\}}\\ &=\mathrm{E}\left[w(\theta)\theta\mid x\right]-E\{w(\theta) \theta \mid x\}\\ &=0.\end{align*}$ | |
Exercise 2.1 | ||||||||||
 [TA] Di Su   Created at: 0000-00-00 00:00   A2Ex1 | 3 | |||||||||
Related last year's exercise and discussion can be found here. Example 1 ${\color{red}\star}{\color{black}\star}{\color{black}\star}$ Different types of priors (50$\%$) Suppose that a new virus that causes a pandemic was recently discovered. A new vaccine against the new virus was just developed. The aim of this exercise is to estimate the vaccine’s efficacy \(\theta\), which is defined as \[\begin{aligned} \theta &= \frac{\pi_0 - \pi_1}{\pi_1}, \end{aligned}\] where \(\pi_0\) and \(\pi_1\) are the attack rates of unvaccinated and vaccinated humans, respectively. In a preclinical stage, the vaccine was tested on laboratory animals (not humans). The following data were obtained.
Hints:
| ||||||||||
3.6 | ||||||||||
Easy Difficult Number of votes: 12 | ||||||||||
question about the hint | |
Anonymous Gopher   Created at: 2023-02-11 17:41 | 2 |
Does the “proper” in hint 2 refer to the proper density? Or it just means “appropreate”? | |
Show 2 reply | |
Anonymous Peacock   Created at: 2023-02-12 10:06 | 2 |
I guess it means 'appropriate', similar to Q2 of Example 2.23 in Chp2's lecture note. | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2023-02-13 19:01 | 2 |
Yes! It simply means “appropriate” in the layman's sense. | |
question about exercise2.1 | |
Anonymous Gopher   Created at: 2023-02-12 16:26 | 1 |
Thank you peacock! I got the meaning! I still have one question: can I seperately define the prior for Π0 and Π1, then multiply them to get the answer in all a,b,c,d part of this question? | |
Show 2 reply | |
Anonymous Peacock   Created at: 2023-02-13 10:30 | 1 |
I did it this way because I think that the two groups are completely independent. | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2023-02-13 19:05 | 4 |
It is a good question.
| |
About "hardly denying information" in hint | |
Anonymous Bobolink   Created at: 2023-02-12 18:11 | 0 |
For this question, in my view, the only hardly denying information is just that the probability is in (0,1) which is meaningless. Maybe we should search for other information on the internet? | |
Show 1 reply | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2023-02-13 19:15 | 2 |
Good question again.
| |
. | |
Anonymous Mouse   Last Modified: 2023-02-13 20:12 | 0 |
. | |
Show reply | |
question 2 | |
Anonymous Warbler   Created at: 2023-02-17 16:31 | 0 |
For Emily's problem, can we consider her prior as weakly informative prior? Since for most of the vaccines created, the attack rate would go down. Even though Emily has used human data to result in this belief, but it is more like a common belief, which may not contradict the prohibition of using the same dataset to derive both prior and posterior distribution. Or has the belief already violated the rules? thanks! | |
Show 1 reply | |
 [TA] Di Su   Created at: 2023-02-17 19:14 | 1 |
Yes, you may regard it as a weakly informative prior. However, you may want to pay attention that Emily proposed the prior after observing the m+nm+n observations in the clinical stage which is based on the data. | |
Exercise 2.2 | ||||||||||
 [TA] Di Su   Created at: 0000-00-00 00:00   A2Ex2 | 2 | |||||||||
Related last year's exercise and discussion can be found here. Example 2 ${\color{red}\star}{\color{red}\star}{\color{black}\star}$ Posterior of function of parameters (50$\%$). Consider the problem in Exercise 1. In this exercise, assume the following prior for \((\pi_0, \pi_1)\): \[f(\pi_0, \pi_1) = \mathbb{1}(\pi_0,\pi_1\in(0,1)).\]
Hints:
| ||||||||||
4.6 | ||||||||||
Easy Difficult Number of votes: 16 | ||||||||||
Question about Q2 | |
Anonymous Peacock   Created at: 2023-02-13 10:16 | 1 |
I am not sure about whether we should transform the posterior derived in Q1 to theta by Jacobian directly, or we need to transform the sampling distribution and prior to theta separately and then time them? Thanks | |
Show 1 reply | |
Anonymous Peacock   Created at: 2023-02-13 20:24 | 1 |
I think I've found out how to solve by representation. Please just ignore this question:) | |
Rewriting posterior | |
Anonymous Mouse   Created at: 2023-02-13 20:12 | 1 |
Let's say I successfully find f(x0|pi0) and f(x1|pi1). Then, since theta can be written in terms of pi0 and pi1, is f(x0,x1|theta) simply (f(x0|pi0)-f(x1|pi1))/f(x1|pi1), or is this mathematically invalid? | |
Show 1 reply | |
 [TA] Di Su   Last Modified: 2023-02-13 23:15 | 5 |
Since $f(x_0,x_1\mid\theta)$ is the density of $x_0,x_1$, it is not the same as how $\theta$ is expressed in terms of $\pi_0$ and $\pi_1$ | |
question about the representation | |
Anonymous Gopher   Created at: 2023-02-14 00:38 | 2 |
When I use representation to represent theta, theta are expressed by 4 independent variables following gamma distribution(by the representation of beta distribution)Is that ok to leave these 4 variables as the final answer? Or should I find a way to further simplify the result? Besides, if I successfully represent theta into other fundamental variables, can I say that theta is already well defined? (Actually, I am confused about the meaning of “well define” in the hint. | |
Show 1 reply | |
 [TA] Di Su   Last Modified: 2023-02-14 19:19 | 5 |
You can use Beta random variables in your representation, so there is no need to further represent them by Gamma random variables. And you are right, a fundamental random variable is already well-defined because we know its PDF and CDF (and the integration of its PDF is finite). | |
Question about representation | |
Anonymous Mosquito   Created at: 2023-02-16 21:25 | 1 |
May I ask if theta is represented by beta rv only? I try to use beta to do the operation but I cannot find a well-defined distribution to represent it. May I ask should I left it there as a product of beta distribution, or actually I should use Jacobian if I can't find the well-defined distribution? For Q3, may I ask what does it mean by same distributional form? For example, if X1 ~ exp(1), X2~exp(2), Z ~N(0,1), does X1+Z has the same distributional form as X2+Z? | |
Show 1 reply | |
 [TA] Di Su   Last Modified: 2023-02-17 19:18 | 0 |
For $\theta$, you can simply use Beta r.v.'s to represent it, and you don't need to derive the exact formula of its PDF. For Q3, your understanding is correct. Having the same distribution form means they come from the same distribution family with possibly different parameters. | |
Further question about "well defined" | |
Anonymous Pinniped   Created at: 2023-02-17 02:08 | 1 |
Thank for the above answar. Fundamental variable is obviously well-defined. But why “if a distribution can be expressed as fundamental random variables, then it is also well-defined (i.e. the integration of pdf is finite)” ? How can we prove this? | |
Show 1 reply | |
Anonymous Swordtail   Last Modified: 2023-02-17 18:11 | 1 |
This is my proof. Please comment if I am wrong. Note that multiple random variables $X_1, …, X_n$ can be regarded as a single multi-dimensional random variable $X = (X_1, …, X_n)$. Let $X$ be a well-defined random variable with support $\mathcal{X}$. Let $Z=g(X)$, where $g$ is any function from $\mathcal{X}$ to $\mathbb{R}$. \begin{align*} P(Z≤z) &= P(g(X)≤z) \\ &= \int_{\mathcal{X}} P(g(X)≤z \mid X=x)\ f_X(x) dx \\ &= \int_{\mathcal{X}} P(g(x)≤z)\ f_X(x) dx \\ &= \int_{\mathcal{X}} \mathbb{1}_{\{g(x)≤z\}}\ \ \ f_X(x) dx \\ &≤ \int_{\mathcal{X}} f_X(x) dx \\ &= 1 \end{align*} Therefore, the CDF of $Z$ is within $[0, 1]$, meaning that $Z$ is also well-defined. | |
Example 1.11 | |
 [Student helper] Martin Lee   Created at: 0000-00-00 00:00   Chp1Eg11 | 0 |
Example 1.11 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Bernoulli-normal (A1 Fall 2019) (A1 Fall 2019) ). Let \(\sigma>0\) be a known constant, and \[\begin{align}& \overset{ \text{iid}}{\sim} & \text{N}(2\theta+1, \sigma^2) ;\\ \theta &\sim& \text{Bern}(1/2).\end{align}\]
| |
3 | |
Easy Difficult Number of votes: 1 | |
I am confused about how was this equation in the answer derived. | |
Anonymous Peacock   Created at: 2023-02-17 01:28 | 0 |
Cov(x1, xn+1) = E {Cov(x1, xn+1 | θ)} + Cov {E(x1 | θ), E(xn+1 | θ)} | |
Show reply | |
Example 2.2 | |
 [Student helper] Martin Lee   Created at: 0000-00-00 00:00   Chp2Eg2 | 0 |
Example 2.2 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Transforming a location parameter to a scale parameter). Invariant priors for location and scale parameters in Theorem 2.1 do NOT contradict with each other. Indeed, we can derive the less intuitive result (2) by the more trivial result (1). Let \([x\mid \theta] \sim f(\cdot\mid \theta)= f_0(\cdot/\theta)/\theta\) as in (\ref{eqt:scaleFamily}). Then we can represent \(x\) as \[x = \theta z,\] where \(z \sim f_0(\cdot)\). It implies than \[\log x = \log z + \log\theta,\] which can be viewed as a location model for \(y =\log x\) with a location parameter \(\phi = \log\theta\). By Theorem 2.1 (1), the invariant prior of \(\phi\) is \[\begin{align}\label{eqt:prior_phi_in_eg} f_{\phi}(\phi) \propto 1. \tag{2.4}\end{align}\] Using the transformation \(\phi = \log \theta\), we can derive back the PDF of \(\theta\) from (\ref{eqt:prior_phi_in_eg}). Since \(\text{d}\phi/\text{d}\theta = 1/\theta\), we have \[f_{\theta}(\theta) \;\propto\; \frac{1}{\theta} f_{\phi}(\log\theta) \;\propto\; \frac{1}{\theta},\] which gives us back part (2) of Theorem 2.1.\(\;\blacksquare\) | |
Easy Difficult Number of votes: 0 | |
how to represent x | |
Anonymous Warbler   Created at: 2023-02-16 16:06 | 0 |
I still cannot fully understand why 𝑥=𝜃𝑧 z~f0(·), [x|𝜃]~f0(·/𝜃)/𝜃 if x=𝜃z, then x~𝜃f0(·)? But how to interpret [x|𝜃]~f0(·/𝜃)/𝜃? | |
Show reply | |
Apply tag, filter or search to load more result