Exercise 2.2 [OLD]
  [TA] Di Su    Created at: 0000-00-00 00:00   2022A2Ex2 0 
Exercise 2.2 (${\color{red}\star}{\color{red}\star}{\color{gray}\star}$ - Posterior of function of parameters (50%)). Consider the problem in Exercise 2.1. In this exercise, assume the following prior for \((\pi_0, \pi_1)\): \[f(\pi_0, \pi_1) = \mathbb{1}(\pi_0,\pi_1\in(0,1)).\]
  1. (10%) Find the posterior \([\pi_0, \pi_1\mid x_{1:m}^{(0)},x_{1:n}^{(1)}]\).
  2. (10%) Find the posterior \([\theta\mid x_{1:m}^{(0)},x_{1:n}^{(1)}]\). Use representation, or leave your answer in terms of a PDF.
  3. (10%) Is the posterior \([\theta\mid x_{1:m}^{(0)},x_{1:n}^{(1)}]\) conjugate with the prior \(\theta\)? Briefly explain. (Use \(\lesssim 50\) words.)
  4. (10%) Is the posterior \([\theta\mid x_{1:m}^{(0)},x_{1:n}^{(1)}]\) proper? Briefly explain. (Use \(\lesssim 50\) words.)
  5. (10%) Graphically present the prior and posterior PDF s of \(\theta\) if the following dataset is obtained. Briefly comment. (Use \(\lesssim 50\) words.)
  6. (Bonus) Try different priors of \((\pi_0, \pi_1)\) and compare the effect on the posterior of \(\theta\). Briefly comment.
Unvaccinated humansVaccinated humans
Infected humans298343
Uninfected humans202657
Hints:
  1. Are \(\pi_0\) and \(\pi_1\) independent a priori?
  2. Representation leads a much neater answer.
  3. As long as the prior and posterior admit the same distributional form (not necessarily a named class of distributions), they are regarded as conjugate.
  4. A prior distribution means that it well defines a random variable.
  5. This example shows to the dual representation of Bernoulli data. For example, a sample of \(n=10\) iid Bernoulli RV s can be represented as one of the following forms:
  • the data are: \(0,0,1,1,0,0,0,0,1,0\); or
  • numbers of 1s and 0s are 3 and 7.

To plot the PDF, you may use of the following methods.

  • (Method 1: algebraic) Derive the analytical form of the PDF. Then plot it as in Assignment 1.
  • (Method 2: representation) Simulate \(N=10^{6}\) replications of the sample based on the representation. Then use kernel density estimator (see Chapter 8 of Stat3005) to estimate the PDF. One example is shown below. To plot the density of \(W = (A-10)/(1+B^2)\), where \(A\sim \text{N}(0,1)\) and \(B\sim \text{Exp}(1)\) are independent, you may use the following (simplified) code to produce the graph.
set.seed(1) 
N = 1e6 
A = rnorm(N) 
B = rexp(N) 
W = (A-10)/(1+B^2) 
plot(density(W), lwd=2, col="blue4") 
abline(h=0, v=0, lty=2, lwd=0.75)
4
Easy Difficult    Number of votes: 7
Ex 2.2
 Anonymous Kraken    Created at: 2022-02-09 02:51  0 
For 4., I'm not sure how to show that the posterior is proper. Do we have to show that the integral of the posterior pdf over the support is finite or show that the posterior consists of several random variables of known distributions?
Show 1 reply
  [TA] Di Su    Last Modified: 2022-02-10 13:12  6 
Both methods are possible.
  • Method 1: Using PDF

To show that the pdf of a random variable $\theta$ is proper, we need to show it satisfies the definition of a probability density function by showing

$$\int_{\Theta}f(\theta)\mathrm{d}\theta=1.$$

We can express the density in terms of its kernel as $f(\theta)=C\kappa(\theta)$. Then we need to show

$$\int_{\Theta}C\kappa(\theta)\mathrm{d}\theta=1\implies\int_{\Theta}\kappa(\theta)\mathrm{d}\theta=\frac{1}{C}.$$

We require $C\neq0$ so that the density is non-degenerate. Then $1/C<\infty$ and we only need to show

$$\int_{\Theta}\kappa(\theta)\mathrm{d}\theta<\infty,$$

because $C$ is explicitly determined by $C=1/\int_{\Theta}\kappa(\theta)\mathrm{d}\theta$.

  • Method 2: Using representation

If the posterior of a parameter $\theta$ can be expressed in terms of known distributions, then it is proper by construction.

Question about part (2)
 Anonymous Giraffe    Created at: 2022-02-09 16:54  1 
Hello, in part (2), I am trying to derive the sampling distribution f(x|theta), in order to find the posterior distribution. However, I am not sure how I can represent the two independent Bernoulli distributions using one single parameter theta. Could I have some hints on this?

Thanks.
Show 3 reply
  [TA] Di Su    Last Modified: 2022-02-09 23:58  5 
Notice that the group labels $\{(j)\}$ are given and fixed. To express the joint distribution of two groups of data using only one parameter, there are two steps.
Firstly, since the two groups are independent of each other, we can factorize their joint distribution as
$$f(x^{(group \;1)},x^{group\;2})=f(x^{(group\;1)})f(x^{(group\;2)}).$$
Secondly, the data points within each group are IID, so we can further factorize the joint distribution of each group as
$$f(x^{(group\;j)})=\prod_{i=1}^nf(x^{(group\;j)}_i),j=1,2.$$
The key is independence.
 Anonymous Giraffe    Created at: 2022-02-09 21:51  1 
Thanks for the reply. I still have a question in mind. I understand $f(x|\pi_0, \pi_1)$ is either in terms of $\pi_0$ or $\pi_1$. However, when it comes to $f(x|\theta)$, $\theta$ is defined as $(\pi_0-\pi_1)/\pi_1$, and I am not sure how to incorporate $\theta$ into $f(x|\theta)$.
Thanks.



  [Instructor] Kin Wai (Keith) Chan    Created at: 2022-02-09 22:14  6 
It is exactly the learning moment of this question. To answer the question, you may proceed in two steps:
  • (Step 1) Derive $[\pi_0, \pi_1 \mid x_{1:m}^{(0)}, x_{1:n}^{(1)}]$.
  • (Step 2) Since $\theta$ is a function of $\pi_0$ and $\pi_1$, you may use representation to define the posterior of $\theta$.
Let's use the following example to illustrate the idea.
  • (Step 1) Suppose we can prove that $f(p_1, p_2 \mid x) = \texttt{dnorm}(p_1,x,1)\times\texttt{dexp}(p_2)$.
  • (Step 2) Let $\phi = p_1 - p_2$. Then, given $x$, the posterior of $\phi$ can be represented as follows:
    \[
    \phi = A- B,
    \]
    where $A\sim \text{Normal}(x,1)$ and $B\sim\text{Exp}(1)$ are independent.
Besides, knowing $\theta$ (but not both $\pi_0$ and $\pi_1$) is not sufficient to define the data generating process of the data $\{x_{1:m}^{(0)}, x_{1:n}^{(1)}\}$. Hence, $f(x_{1:m}^{(0)}, x_{1:n}^{(1)}\mid \theta)$ and $f(x_{1:m}^{(0)}, x_{1:n}^{(1)}\mid \pi_0, \pi_1)$ are very different. Indeed, deriving $f(x_{1:m}^{(0)}, x_{1:n}^{(1)}\mid \theta)$ is highly non-trival.

Independence
 Anonymous Orangutan    Last Modified: 2022-02-10 13:06  0 
Because of the independence, is it possible to rewrite like this ?
\[f(\pi0,\pi1) = 1(\pi0,\pi1\in(0,1)) = 1(\pi0 \in(0,1)) ×1(\pi1\in(0,1)) = unif(0,1) ×unif(0,1)\]

Show 1 reply
  [TA] Di Su    Last Modified: 2022-02-10 13:12  3 
Yes, it can be factorized as $f(\pi_0,\pi_1) = 1(\pi_0 \in(0,1)) ×1(\pi_1\in(0,1))$, and in this way we know $\pi_0\sim Unif(0,1),\pi_1\sim Unif(0,1)$ where $\pi_0$ is independent of $\pi_1$.
However, $f(\pi_0,\pi_1)$ is a probability density function, whereas $Unif(0,1) ×Unif(0,1)$ represents a distribution, and it is not rigorous to write $f(\pi_0,\pi_1)=Unif(0,1)\times Unif(0,1)$.
prior distribution
 Anonymous Orangutan    Last Modified: 2022-02-11 18:09  0 

When I computed θ by using given prior, there is a probability that θ will be a huge value like (349) .

And it affects the mean value of θ and the form of distribution.

As a result, distribution become strange shape and axis.

How can I solve this issue?

Show 1 reply
  [Instructor] Kin Wai (Keith) Chan    Last Modified: 2022-02-11 18:06  5 
Probably, you faced a problem when you plot the prior density. It is a very goodquestion. Indeed, it is the challengingpart of the question. Plotting the density of the prior is a bit tricky. As I mentioned in the hints, there are two methods.
  • (Method 1) Algebraically, you can derive the prior pdf of theta. After some algebras, you can find a simple closed-form solution for that. However, it may take you some energy for deriving it.
  • (Method 2) You may use simulation. However, the tail of the prior pdf of theta is very fat. So, you will obtain many crazily large values of theta. In this case, the R-function density(.) is not very stable. There are some sophisticated methods to deal with it. You will learn it in Stat 3005 Nonparametric Statistics. Now, I would like to tell you a simple trick for handling this.
    • Step 1: Simulate $\theta$ as usual.
    • Step 2: Select the $\theta$'s that are smaller than a certainvalue (e.g., 5). Store them as thetaSelected.
    • Step 3: Find the probability that $\theta$ is smaller than 5. Denote it by p.
    • Step 4: Compute the densityof the selected $\theta$'s. In R set, fit=density(thetaSelected).
    • Step 5: Plot the (fit$y)*p against fit$x. It is the correct density value after adjusting for the selection bias.Then you will obtain a graph similar to this:
For your reference, you may use the following R code.
n = 1e7
x = runif(n)
y = runif(n)
theta = x/y-1
theta0 = theta[which(abs(theta)<5)]
p = mean(abs(theta)<5)
d = density(theta0)
plot(d$x, d$y*p, xlim=c(-1,4), type="l", lwd=2, col="blue", ylab="Density", xlab=expression(theta))
abline(v=c(0-1,1/2-1), h=0, lty=2, col="red")
Prior distribution (Q5 solution)
 Anonymous Orangutan    Last Modified: 2022-02-17 14:25  0 
I derived my prior distribution by using representation.
pai0 = runif(N,0,1)
pai1 = runif(N, 0,1)
theta = (pai0-pai1)/pai1
However, in the solution, prior pdf is derived by this step.
It can be derived by finding the joint density of (θ, π1) using the Jacobian method, and then integrate out π1. The true prior density is directly plotted
I'm confused 2 points.
  1. How to derive the joint density of ($\theta, \pi_1$) by using Jacobian.
  2. R code: why you run rep(0,100) and rep(0.5, 101) ? (why repeat 0 and what is the meaning of 101?)
 d1 = c(rep(0,100),rep(0.5,101),1/(2*seq(1.01,4,0.01)ˆ2))
Show 2 reply
  [TA] Di Su    Last Modified: 2022-02-18 10:12  2 
Thanks for your question!
  • Regarding the closed form of the prior:

The prior density is

$$\begin{align}f\left(\theta\right)=\left\{\begin{array}{cc}\frac{1}{2} & -1<\theta<0 \\\frac{1}{2 (\theta+1)^{2}} & \theta\geq0 \end{array}.\right.\end{align}$$

It can be derived by the following steps. Firstly, let $$y_1=\pi_0/\pi_1-1,y_2=\pi_1.$$ Correspondingly, $$\pi_0=(y_1+1)y_2,\pi_1=y_2.$$ Because of the constraint $\pi_0,\pi_1\in(0,1)$, it must be satisfied that $(y_1+1)y_2\in(0,1),y_2\in(0,1)$. Moreover, it is easily seen that $y_1=\pi_0/\pi_1-1\in(-1,\infty)$. Therefore, we must have $$y_1\in(-1,\infty),y_2\in(0,\min(1,1/(y_1+1))).$$

Secondly, the Jacobian matrix of the mapping from $(y_1,y_2)$ to $(\pi_0,\pi_1)$ is
$$J=\left|\begin{array}{cc}y_2 & y_1+1 \\0 & 1 \end{array}\right|=y_2.$$

Thus we have
$$\begin{equation}f(y_1,y_2) = f(\pi_0,\pi_1)\left|J\right| = y_2\mathbb{1}(y_1\in(-1,\infty),y_2\in(0,\min\{1,1/(y_1+1)\})).\end{equation}$$

Thirdly, denoting the support of $y_2$ as $\mathcal{Y}_2$, the marginal distribution of $y_1$ can then be found by
$$\begin{align*} f(y_1) =& \int_{\mathcal{Y}_2}f(y_1,y_2)\mathrm{d}y_2=\left\{\begin{array}{cc}\int_{0}^{1}y_2\mathrm{d}y_2&\text{ if } 1/(y_1+1)>1\\\int_{0}^{1/(y_1+1)}y_2\mathrm{d}y_2 &\text{ if } 1/(y_1+1)\leq1\end{array}\right.\\ =&\left\{\begin{array}{cc} \frac{1}{2}&\text{ if } -1<y_1<0\\\frac{1}{2(y_1+1)^2} &\text{ if } y_1\geq0. \end{array}\right. \end{align*}$$

Since $\theta\equiv y_1$, the desired density follows.

The solution is also updated.

  • Regarding the R codes:

Repetition of $0$ is for better visualization. We know $f(\theta)=0 $ on the interval $[-2,-1]$.

And $101$ is the repetition number. The repetition number comes from how we set the x-axis. Notice the interval $(-1,0)$ is divided into 100 sub-intervals with length $0.01$. However, there are $101$ boundary points. Hence there is an extra $1$.


 Anonymous Orangutan    Created at: 2022-02-17 17:41  0 
Thanks for your reply.
However, there are some latex errors.
I hope you will fix this. Thanks
Using Representation in Ex.2.2 Q1
 Anonymous Grizzly    Created at: 2022-02-18 12:43  0 
In Exercise 2.2 Q1,
I represent $(\pi_0,\pi_1)=B_0+B_1$, where $B_0\sim\text{Beta}(S_0+1,m-S_0+1)$ and $B_1\sim\text{Beta}(S_1+1,n-S_1+1)$ because after Jacobian transformation, I can get
$$f\left(\pi_0,\pi_1\mid x_{1:m}^{(0)},x_{1:n}^{(1)}\right)\propto\pi_0^{S_0}(1-\pi_0)^{m-S_0}\pi_1^{S_1}(1-\pi_1)^{n-S_1}\mathbb{1}\left\{\pi_0,\pi_1\in(0,1)\right\}$$
which seems reasonable. However, it turns out that my answer is wrong. What is the reason? Thank you.
Show 1 reply
  [TA] Di Su    Created at: 2022-02-18 13:15  2 
There are two problems regarding your answer:
  1. You need to mention that $B_0$ and $B_1$ are independent.
  2. The meaning of the notation $B_0+B_1$ hasn't been made clear. Usually, when we write $B_0+B_1$, we are referring to a random variable that can be represented as the summation of two Beta random variables. However, $f(\pi_0,\pi_1)$ is a density function with two variates. Can we equate them? Moreover, why is it a summation (of two independent Beta random variables)?