About Chapter 5 | |
 [Developer] Kai Pan (Ben) Chu   Created at: 0000-00-00 00:00   Chp5General | 2 |
Question / Discussion that is related to Chapter 5 should be placed here | |
5 | |
Easy Difficult Number of votes: 1 | |
HPD: normalization step | |
Anonymous Orangutan   Last Modified: 2022-03-21 13:26 | 1 |
I' m still not sure the step of normalization. ①For me, it is weird the sum(d0) and the interval are separated. Is it possible to rewrite like below? ⓶Also, I computed d and I think d is not normalized. Although it is ok to multiply by (theta[2]-theta[1]) again when we compute N,I don't know how to interpret the y axis of posterior distribution(from 0 to 5 in this example). Why we don't use d = d0/sum(d0) , when we plot (theta, d) ?
| |
Show 1 reply | |
 [TA] Di Su   Last Modified: 2022-03-23 20:42 | 2 |
① It is ok to write like below. ⓶ For a continuous random variable X, it is possible that $f_X(x)>1$, what we want to restrict is $\int_\mathcal{X}f_X(x)\mathrm{d}x\leq1$. We don't use d = d0/sum(d0) when we plot (theta, d) because we want to use Riemman sum, so the term (theta[2]-theta[1]) is needed. | |
credible interval | |
Anonymous Mink   Created at: 2024-03-29 17:56 | 0 |
May I ask is there any frequentist's language for the credible interval like confidence interval to reject H0 at confidence... sth like that? | |
Show reply | |
Exercise 4.2 | |
 [TA] Chak Ming (Martin), Lee   Last Modified: 2024-03-23 10:41   A4Ex4.2 | 4 |
" Related last year's exercise and discussion can be found here. Exercise 2 (Horse racing (40%)). Hong Kong Jocemph Club (HKJC) organizes approximately 700 horse races every year. This exercise analyses the effect of draw on winning probability. According to HKJC: The draw refers to a horse’s position in the starting gate. Generally speaking, the smaller the draw number, the closer the runner is to the insider rail, hence a shorter distance to be covered at the turns and has a slight advantage over horses with bigger draw numbers. The dataset horseRacing.txt , which is a modified version of the dataset in the GitHub project HK-Horse-Racing, can be downloaded from the course website. It contains all races from 15 Sep 2008 to 14 July 2010. There are six columns:race (integral): race index (from 1 to 1364).distance (numeric): race distance per meter (1000, 1200, 1400, 1600, 1650, 1800, 2000, 2200, 2400).racecourse (character): racecourse (""ST"" for Shatin, ""HV"" for Happy Valley).runway (character): type of runway (""AW"" for all weather track, ""TF"" for turf).draw (integral): draw number (from 1 to 14).position (integral): finishing position (from 1 to 14), i.e., position=1 denotes the first one who completed the race.The first few lines of the dataset are shown below. Rcodetag2 In this example, we consider all races that (i) took placed in the turf runway of the Shatin racecourse, (ii) were of distance 1000m, 1200m, 1400m, 1600m, 1800m and 2000m; and (iii) used draws 1–14. Let \(n\) be the total number of races satisfying the above conditions; \(\texttt{position}_{ij}\) be the position of the horse that used the \(j\)th draw in the \(i\)th race for each \(i,j\). For each \(i=1, \ldots, n\), denote \[x_i = \mathbb{1}\bigg( \frac{1}{|\texttt{draw}_i\cap[1,7]|}\sum_{j\in\texttt{draw}_i\cap[1,7]} \texttt{position}_{ij} < \frac{1}{|\texttt{draw}_i\cap[8,14]|}\sum_{j\in\texttt{draw}_i\cap[8,14]} \texttt{position}_{ij} \bigg),\] Denote the entire dataset by \(D\) for simplicity. Suppose that \[\begin{aligned} \left[x_i \mid \theta_{\texttt{distance}_i} \right] & \overset{ {\perp\!\!\!\!\perp } }{\sim} & \text{Bern}(\theta_{\texttt{distance}_i}), \qquad i=1,\ldots,n\label{eqt:raceModel1}\\ \theta_{1000},\theta_{1200},\theta_{1400},\theta_{1600},\theta_{1800},\theta_{2000} & \overset{ \text{iid}}{\sim} & \pi(\theta),\label{eqt:raceModel2}\end{aligned}\] where \(\pi(\theta) \propto \theta^2(1-\theta^2)\mathbb{1}(0<\theta<1)\) is the prior density. (10%) What are the meanings of the \(x\)’s and \(\theta\)’s? (10%) Test \(H_0: \theta_{1000}\leq 0.5\) against \(H_0: \theta_{1000}> 0.5\). (10%) Compute 95% credible interval for each \(\theta_{1000},\theta_{1200},\ldots,\theta_{2000}\). Plot them on the same graph. (10%) Interpret your results in part (3). Use no more than about 100 words. " | |
4.6 | |
Easy Difficult Number of votes: 10 | |
small typo "H1" in Ex4.2 | |
Anonymous Mink   Created at: 2024-03-29 12:06 | 1 |
Show reply | |
Example 3.2 | |
 [TA] Chak Ming (Martin), Lee   Created at: 0000-00-00 00:00   Chp3Eg2 | 0 |
Example 3.2 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Binomial model). Let \(x\sim \text{Bin}(n,\theta)\). Consider the Jeffreys prior and the flat prior: \[f_1(\theta) \propto \theta^{-1/2}(1-\theta)^{-1/2} \qquad\text{and}\qquad f_2(\theta) \propto 1,\] for \(\theta\in(0,1)\). The corresponding posteriors are \([\theta\mid x] \sim \text{Beta}(x+1/2, n-x+1/2)\) and \([\theta\mid x] \sim \text{Beta}(x+1, n-x+1)\), respectively. The corresponding MAP estimators are then given by (Exercise) \[\widehat{\theta}_{MAP(1)} = \left\{ \begin{array}{ll} \left[ \frac{x-1/2}{n-1} \right]_0^1 & \text{if $n>1$};\\ x/n & \text{if $n=1$}. \end{array}\right. \qquad \qquad\text{and}\qquad \widehat{\theta}_{MAP(2)} = \frac{x}{n},\] where \([a]_0^1 = \min\{\max(a,0),1\}\) for \(a\in\mathbb{R}\). The estimators \(\widehat{\theta}_{MAP(1)}\) and \(\widehat{\theta}_{MAP(2)}\) are equivalent as \(n\rightarrow\infty\). \(\;\blacksquare\) | |
3 | |
Easy Difficult Number of votes: 1 | |
$\hat{\theta}_{MAP(1)}$ | |
Anonymous Pumpkin   Last Modified: 2024-03-29 10:26 | 0 |
We can get $\hat{\theta}_{MAP(2)}$ by taking the derivative of the $Beta$ density kernal with respect to $\theta$. i.e. Let $f(\theta) = \theta^{\alpha - 1}(1-\theta)^{\beta - 1}$, where $\alpha = x+1, \beta = n - x + 1$, we have $$ \frac{\partial f}{\partial \theta} = \theta^{\alpha - 2}(1-\theta)^{\beta-2}\left[ (\alpha-1)(1-\theta) - (\beta-1)\theta \right] $$ And then we can get $\displaystyle\hat{\theta}_{MAP(2)}=\frac{x}{n}$. However, how can we get the $\hat{\theta}_{MAP(1)}$ in a similar way? | |
Show reply | |
Exercise 4.1 | |
 [TA] Chak Ming (Martin), Lee   Created at: 2024-03-18 14:48   A4Ex4.1 | 3 |
"Related last year's exercise and discussion can be found here. Exercise 1 (Testing and region estimation (60%)). Let \[\begin{aligned} & \overset{ \text{iid}}{\sim} \text{N}(\theta, \theta^2) , \\ \theta &\sim \theta_0\text{Exp}(1),\end{aligned}\] where \(\theta_0=0.5\). Suppose the dataset A4Q1.csv
is observed. The goal is to perform inference on \(\theta\).
| |
4.6 | |
Easy Difficult Number of votes: 10 | |
What does "Modify the prior" mean in (vi)? | |
Anonymous Mink   Created at: 2024-03-28 22:23 | 0 |
May I ask "Modify the prior" meaning in (vi)? | |
Show 2 reply | |
Anonymous Mink   Created at: 2024-03-29 12:07 | 0 |
I think I can try my own prior which is easy to calculate let me do this hah | |
Anonymous Hippo   Last Modified: 2024-03-29 16:25 | 0 |
I think this simply mean modifying the prior for simple null hypothesis problem. Please refer to chapter 4.5 | |
About Chapter 4 | |
 [Developer] Kai Pan (Ben) Chu   Created at: 0000-00-00 00:00   Chp4General | 2 |
Question / Discussion that is related to Chapter 4 should be placed here | |
2 | |
Easy Difficult Number of votes: 1 | |
small typo | |
Anonymous Mink   Created at: 2024-03-27 13:07 | 2 |
It looks there is a typo in our in-class note? | |
Show reply | |
Exercise 2.1 | ||||||||||
 [TA] Chak Ming (Martin), Lee   Created at: 2024-02-15 21:41   A2Ex1 | 3 | |||||||||
"Related last year's exercise and discussion can be found here. Example 1 ${\color{red}\star}{\color{black}\star}{\color{black}\star}$ Different types of priors (50$\%$) Suppose that a new virus that causes a pandemic was recently discovered. A new vaccine against the new virus was just developed. The aim of this exercise is to estimate the vaccine’s efficacy \(\theta\), which is defined as \[\begin{aligned} \theta &= \frac{\pi_0 - \pi_1}{\pi_1}, \end{aligned}\] where \(\pi_0\) and \(\pi_1\) are the attack rates of unvaccinated and vaccinated humans, respectively. In a preclinical stage, the vaccine was tested on laboratory animals (not humans). The following data were obtained.
Hints:
| ||||||||||
3.6 | ||||||||||
Easy Difficult Number of votes: 8 | ||||||||||
May I ask why centering the priors at 0.8 and 0.2 is not a strong belief? It seems not hardly denying | |
Anonymous Mink   Created at: 2024-03-01 17:23 | 1 |
Show 1 reply | |
Anonymous Hippo   Created at: 2024-03-29 16:24 | 0 |
I guess only assuming the support is a strong belief, but adjusting hyperparameters to show our prior belief of centering situation is not strong. | |
Example 2.2 | |
 [TA] Chak Ming (Martin), Lee   Created at: 0000-00-00 00:00   Chp2Eg2 | 3 |
Example 2.2 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Transforming a location parameter to a scale parameter). Invariant priors for location and scale parameters in Theorem 2.1 do NOT contradict with each other. Indeed, we can derive the less intuitive result (2) by the more trivial result (1). Let \([x\mid \theta] \sim f(\cdot\mid \theta)= f_0(\cdot/\theta)/\theta\) as in (\ref{eqt:scaleFamily}). Then we can represent \(x\) as \[x = \theta z,\] where \(z \sim f_0(\cdot)\). It implies than \[\log x = \log z + \log\theta,\] which can be viewed as a location model for \(y =\log x\) with a location parameter \(\phi = \log\theta\). By Theorem 2.1 (1), the invariant prior of \(\phi\) is \[\begin{align}\label{eqt:prior_phi_in_eg} f_{\phi}(\phi) \propto 1. \tag{2.4}\end{align}\] Using the transformation \(\phi = \log \theta\), we can derive back the PDF of \(\theta\) from (\ref{eqt:prior_phi_in_eg}). Since \(\text{d}\phi/\text{d}\theta = 1/\theta\), we have \[f_{\theta}(\theta) \;\propto\; \frac{1}{\theta} f_{\phi}(\log\theta) \;\propto\; \frac{1}{\theta},\] which gives us back part (2) of Theorem 2.1.\(\;\blacksquare\) | |
3.8 | |
Easy Difficult Number of votes: 5 | |
how to represent x | |
Anonymous Warbler   Created at: 2023-02-16 16:06 | 1 |
I still cannot fully understand why 𝑥=𝜃𝑧 z~f0(·), [x|𝜃]~f0(·/𝜃)/𝜃 if x=𝜃z, then x~𝜃f0(·)? But how to interpret [x|𝜃]~f0(·/𝜃)/𝜃? | |
Show reply | |
Transformation | |
Anonymous Pumpkin   Last Modified: 2024-03-29 12:03 | 1 |
Why $\displaystyle\frac{1}{\theta}f_{\phi}(\log\theta)\propto\frac{1}{\theta}$? $\log\theta$ also contains $\theta$? | |
Show reply | |
About Chapter 1 | |
 [Developer] Kai Pan (Ben) Chu   Created at: 0000-00-00 00:00   Chp1General | 2 |
Question / Discussion that is related to Chapter 1 should be placed here. | |
3 | |
Easy Difficult Number of votes: 4 | |
meaning of Scissors | |
Anonymous Rabbit   Created at: 2022-01-22 00:00 | 1 |
In the text note, the symbol ofScissors appears a lot of time. what is the meaning of this symbol? | |
Show 1 reply | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2022-01-23 12:00 | 1 |
Optional materials are indicated by scissors. For more information about the symbols, please refer to the instruction of the lecture note on the course website. | |
ancillary statistic | |
Anonymous Loris   Created at: 2022-01-26 00:00 | 2 |
Is there any difference between pivotal quantity andancillary statistic (mentioned in tutorial??? | |
Show 2 reply | |
 [TA] Di Su   Created at: 2022-01-26 12:00 | 4 |
A pivotal quantity may not be a statistic (a function of data) because it may involve unknown parameters. If a pivotal quantity is indeed a statistic, then it is called an ancillary statistic. For example, suppose $X_1,\dots,X_n\overset{iid}{\sim} N(\theta,1)$ where $\theta\in\mathbb{R}$ is an unknown parameter. Using representation, $X_1-\theta=Z_1$ for some $Z_1\sim N(0,1)$ whose distribution is independent of $\theta$, hence the quantity $X_1-\theta$ is a pivotal quantity. However, $X_1-\theta$ is not a statistic since $\theta$ is unknown, hence not an ancillary statistic. On the other hand, the quantity $X_1-\bar{X}$ is pivotal and free of the unknowns, hence an ancillary statistic. | |
 [Instructor] Kin Wai (Keith) Chan   Created at: 2022-01-26 12:00 | 2 |
Di explained very well! If you wish to know more about ancillary statisticsand pivotal quantity, you may refer to my Stat4003 lecture notes (see here):
| |
Posterior as prior (2021 spring midterm Q1) | |
Anonymous Orangutan   Created at: 2022-02-27 11:00 | 5 |
To derive posterior predictive, we can use the posterior distribution as prior and plug into prior predictive. Here is my question. Why $p_j$ is still used not $p_j^{(n)}$ in this case? | |
Show 1 reply | |
 [TA] Di Su   Created at: 2022-03-01 14:37 | 0 |
Thanks for pointing it out. Yes, it should be $p_j^{(n)}$. | |
About range of parameter | |
Anonymous Loris   Last Modified: 2022-04-28 01:19 | 2 |
I notice that when updating our belief in $\theta$ using observed data, the range of our parameter in prior belief will never be changed. Algebraically, this is because the indicator can not be discarded when doing calculations (i.e. posteior $\propto$ prior $\times$ sampling dist). I am wondering if there is an intuitive way to understand that our belief in the range of $\theta$ cannot be modified by data? Thank you very much! | |
Show 3 reply | |
Anonymous Ifrit   Last Modified: 2022-04-28 01:13 | 2 |
I think below is the rough idea algebraically. Posterior is obtained by multiplying prior and sampling distribution (and normalizing constant). As the values of the prior out of the range of parameter are 0, no matter what we multiply what values of sampling distribution to those values in the prior, the resulting values of the posterior out of the range are still 0 (0 multiply by any value). | |
Anonymous Loris   Last Modified: 2022-04-28 13:36 | 2 |
Yes I think this explains the reason from the algrebrical perspective. Thanks a lot:) From my point of view, intuitively, this kind of shows one limitation of Bayesian. That is, we must correctly specify the support of parameter in prior. Otherwise, we may fail to cover the ture model. I am not sure if this explanation is good. Maybe there will be better intuitions:) | |
Anonymous Ifrit   Last Modified: 2022-04-29 15:53 | 3 |
In my opinion, it is not the limitation of Bayesian itself, because Bayesian does not require that we must specify the (bounded) support of parameter in prior. If we are afraid that we cannot specify the correct support of the distribution of the parameter, then why don't we just let the prior distribution has unbounded support? For example, if we are very sure that theta fall between 0 to 1, but to play safe, we don't want to eliminate the possibility that theta falls out of the range of 0 to 100. Then, we could construct a prior like this: f(theta) = 0.99999 for 0 < theta < 1, and 0.000000000000001 otherwise (for -infinity < theta < 0 or 1 < theta < infinity). So the prior keep our strong belief, while does not eliminate the possibility that theta falls out of the range. After incorporating the data to compute the posterior, the data can somehow correct us, if our belief is wrong (for example, see the solution of the optional part of A2). I think this type of prior (with unbounded support and highly concentrated at some region(s) while having extremely low densities in all other regions) shows the full flexibility of Bayesian. So in this sense, having bounded support in prior is actually quite a strong belief? I think unless we are 100% sure that thetamust not fall into some region, having a prior with unbounded support is a safer choice. | |
Frequentist vs Bayesian | |
Anonymous Auroch   Created at: 2024-01-26 16:52 | 4 |
I remembered in the lecture notes it says that frequentists' point of view can be understood as Bayesian with a very strong prior belief, and that the Bayesian calculations will not be meaningful as the posterior will be equal to the prior. I wonder if the roles can be reversed. For instance, can we set up a null hypothesis with a variable theta and test using frequentists' philosophy? (Eg. theta is between 0.4-0.6 uniformly distributed). I know that we are taught that assuming theta is variable is already known as Bayesian, but can we still do some frequentist-like calculations? | |
Show reply | |
Constant Prior | |
Anonymous Mink   Last Modified: 2024-01-31 22:19 | 3 |
Can I say the Prior is the true statement if after observed data for any time Prior still keep constant as the Posterior? | |
Show reply | |
Prior Predictive | |
Anonymous Pumpkin   Created at: 2024-02-29 22:10 | 0 |
Why in lecture notes Chapter 1 Page 3, prior predictive is defined as $f(x_{1:n})$, but in handwritten notes, it is defined as $f(x_{n+1})$? | |
Show reply | |
Bayes Factor | |
Anonymous Pumpkin   Created at: 2024-02-29 22:07 | 0 |
| |
Show reply | |
Example 3.3 | |
 [TA] Chak Ming (Martin), Lee   Created at: 0000-00-00 00:00   Chp3Eg3 | 0 |
Example 3.3 (${\color{blue}\star}{\color{gray}\star}{\color{gray}\star}$ Regression). Suppose that the covariates \(x_1,\ldots, x_n\) be fixed and known. Let \[\begin{align}& \overset{ {\perp\!\!\!\!\perp } }{\sim} & \text{N}(\theta x_i , \sigma_0^2), \qquad i=1, \ldots, n;\\ \theta &\sim&\text{N}(\theta_0, \tau_0^2), \end{align}\] where \(\sigma_0, \tau_0>0\) and \(\theta_0\in\mathbb{R}\) are given. Then the posterior of \(\theta\) is given by \[\begin{align} f(\theta\mid y_{1:n}) &\propto& f(y_{1:n}\mid \theta) f(\theta) \\ &\propto& \exp\left\{ - \frac{1}{2\tau_0^2} (\theta-\theta_0)^2\right\} \prod_{i=1}^n \exp\left\{ -\frac{1}{2\sigma_0^2}(y_i - x_i \theta)^2 \right\} \\ &\propto& \exp\left\{ -\frac{1}{2}\left[ \left( \frac{1}{\tau_0^2} + \frac{\sum x_i^2}{\sigma_0^2} \right)\theta^2 - 2\left( \frac{\theta_0}{\tau_0^2} + \frac{\sum x_i y_i }{\sigma_0^2} \right)\theta\right] \right\}.\end{align}\] By Lemma 1.2, we know that \[\left[ \theta\mid y_{1:n} \right] \sim \text{N}\left( \frac{B}{A}, \frac{1}{A}\right), \qquad \text{where} \qquad A = \frac{1}{\tau_0^2} + \frac{\sum x_i^2}{\sigma_0^2} \qquad \text{and} \qquad B = \frac{\theta_0}{\tau_0^2} + \frac{\sum x_i y_i }{\sigma_0^2}.\] The MAP estimator of \(\theta\) is \[\widehat{\theta}_{MAP} = \frac{B}{A} = \frac{\theta_0({\sigma_0^2}/{\tau_0^2}) + {\sum x_i y_i }}{({\sigma_0^2}/{\tau_0^2}) + {\sum x_i^2} }. \tag*{$\blacksquare$}\] | |
Easy Difficult Number of votes: 0 | |
why [𝜃∣𝑦1:𝑛]∼N(𝐵/𝐴,1/𝐴) | |
Anonymous Armadillo   Created at: 2024-02-28 22:07 | 0 |
For [𝜃∣𝑦1:𝑛]∼N(𝐵/𝐴,1/𝐴), I would like to ask that why the mean is B/A and the variance is 1/A. Thank you! | |
Show 1 reply | |
Anonymous Pumpkin   Created at: 2024-03-29 10:36 | 0 |
Please refer to Chapter 1, Page 10, Lemma 1.2 | |
Apply tag, filter or search to load more result