Teenblue: Guide for Mathematics T Assignment, Sem 3/2013

Friday, 2 August 2013

Guide for Mathematics T Assignment, Sem 3/2013

You can't copy all the answers as this semester involve random digits that are impossible for two individuals to have completely same data, hence, this guide will only be assisting you in solving the questions. This serves as only a guide, so no detailed solution here.

Question 1
Give the definition of subjective probability. Find the definition online and describe three examples. Don't just give, describe! For example: "The subjective probability that a China player would win in the match, as deemed by the Chinese player, would be 0.95 because they have faith in him".

Question 2
Generate 30 random digits, and then compartmentalise them into three sections: three different digits, two same digits and three same digits. Tabulate your result, find the respective probability. (If you got 20 out of 30, probability is 0.67).

That is your probabilities for the three categories. Find the answers according to the random numbers you generated.

In making deduction, just say which cases have high chances of occur, and which have low chances.

Question 3
Generate 100 random digits using either excel, calculator, or a website. Here's a website to help you:
http://www.random.org/integers/

Again, compartmentalise them into three categories.

This time you'll be asked to find the 90% and 95% confidence intervals. This involves knowledge on sampling and estimation.

Assuming that the population is normally distributed, to find its 90% confidence intervals, you merely want to find the 90% regions around the centre, which means you'll be eliminating 5% from the left side of the normal distribution, and 5% from the right side of the distribution. Same theory applies to the 95% confidence intervals (find 95% of area under graph at the centre, eliminating 2.5% at the left most and 2.5% at the right most. If you don't get what I mean, search your books for diagrams). Since 95% confidence intervals has a higher confidence, the intervals you get should be larger.

This questions requires only direct application of the formula, so just do it. Show your steps, and make a summary by tabulating them in a table would be helpful.

Question 4
a) Not sure what to do about this part, but we simply write the subjective probability obtained in Q3 in a table.

b)
Generate another 100 random digits, and find the subjective probabilities for each category. Chi-squared Tests is performed to investigate whether to accept or reject the null hypothesis. Your null hypothesis is "the distribution fits the distribution in (a)" and your alternative hypothesis is "the distribution does not fit the distribution in (a)".

You would be expected to have knowledge on Chi-Squared Tests. Just directly apply the formula and see whether the chi-squared calculated exceeds the critical value. If it does, the null hypothesis is rejected, and you can conclude that the distribution obtained in (b) does not fit the distribution obtained in (a).

Note that for chi-squared calculation to be valid, each category must have frequency of at least 5. If your "three same digits" category has less than 5 values, you need to merge it with another category of your choice to form a category of frequency exceeding 5.

Thus, if merging is done, your degree of freedom would be equal to 1 (2 - 1 = 1). You'll have to observe here whether your expected frequency of all categories are bigger or smaller than 10. If your expected frequency is less than 10, then Yates' Correction needs to be employed. But it is unlikely for your expected frequency to be less than 10 now that you've merged two categories. However, if this situation does arise, use this equation:

If all your categories have frequency exceeding 5 and no merging is done, meaning your degree of freedom is 2, and if your degree of freedom is one but all your expected frequency exceeds 10, then ignore this correction and just use the normal chi-squared formula.

Note:
Use Yates' Correction Only When
1) degree of freedom = 1
2) expected frequency < 10
Both conditions need to be met for Yates' Correction to be employed.

Chi-squared tests is chosen here because the object to be investigated is categorical which does not, and could not, presume a normal distribution graph. (behaviours of categories)

Question 5
(a) Use the subjective probability you obtained earlier.

(b) Generate 64 digits using your calculator. Compartmentalise them again and find the probabilities. Then use hypothesis testing (normal approximation) to check whether to accept or reject the null hypothesis. Your null hypothesis is "the probability that a number has three different digits is more than the probability you have suggested in (a)".

Hypothesis testing of normal approximation is chosen because the object to be investigated here is numerical that could resemble a normal distribution graph. (The probability of three different digits).

*The guide above is only a guide and is not an official answer by MPM.

76 comments:

Unknown4 August 2013 at 21:03
Thank you for your teaching. I love it. It helps a lot.
There's some confusion which I cannot understand well from the assignment's question.

For the Q4c, the category are divide into 3 diff. , 2 same and 3 same digits.

During the calculation for chi square test, I put my
X= occurrence pf the digit,
n= 3 (three digit)
p= digit occurrence
q= digit absence
P(occurrence)= 0.10
X~B(3, 0.1)

When come to the binomial calculation P(X=X)= nCr(0.1)r(0.9)n-r , I notice the r need to have 0, 1, 2 and 3.

For the zero occurrence of the digit, do I create another category for it?

As the for % of significant level, do I assume it is 5% or 1% ?
Because this will determine the acceptance or rejection of the distribution.

Thanks
ReplyDelete
Replies
Danny23124 August 2013 at 22:51
Umm....Chi-squared tests do not involve Binomial calculation, at least for this question. You simply calculate the number of digits that contain 3 same digits, and then find the probability that you'll find it in the set of digits you obtained. As far as I know, binomial calculation is not involved.

Significance level used, as I observed, is normally 5% or 1%. I just learnt chi-squared few days ago so I'm not very familiar with this topic, but the usual level of significance adopted is 5% or 1%, so I guess we pick them for the test. There's no right or wrong for your assignment as long as your steps are correct. If your null hypothesis is rejected, then let it be. It simply means your sample is just very random and unreliable. =)
ReplyDelete
Replies
Unknown5 August 2013 at 18:23
This comment has been removed by the author.
ReplyDelete
Replies
Unknown5 August 2013 at 19:17
Does it mean that my observation,Oi is from the 50 digits and the Ei is from my 100 digits?

If my,
P( 3 diff digit ) = 0.82, f= 41 for 50 DIGITS
P( 3 diff digit ) = 0.69, f= 69 for 100 DIGITS

For chi squared test on category 3 diff digits

METHOD 1:-
Oi = 41
Ei = nP(X=x) , where n = 50 and P(X=x) = 0.69

METHOD 2:-
Oi = 41
Ei = 69

which method is correct? sorry if I'm a bit slow on catching the idea of you message.
ReplyDelete
Replies
Danny23125 August 2013 at 19:41
Refer to Q4(a)
If you subjective probability is as follow:
three same digits: 0.05
three different digits: 0.70
two same digits: 0.25
then your expected frequency, Oi, for three same digits is 5, three different digits is 70, and two same digits is 25

You simply assume there are 100 numbers, and thus you just take the probability x frequency to get its frequency of occurrence which is the expected frequency.

Your observed frequency will follow the probabilities you obtained from the numbers you generated in 4(b). If you get 3 three same digits number for this random digits, then your observed frequency is 3. And thus (Oi-Ei) = (3-5)
ReplyDelete
Replies
Unknown6 August 2013 at 19:28
thanks XD
ReplyDelete
Replies
Anonymous17 August 2013 at 19:06
Hello~ I have a question about the assignment and I hope that you can answer me please. About the question 5(b), we must have to use the hypothesis testing of normal distribution? I'm not sure about how to use this method. Do we have to use the 5% significance level? I simply thought that I just need to generate all the numbers, and calculate the probability of getting 3 different digits, that is eg. 50/64 which is higher than the probability suggested in (a).
Thanks.
ReplyDelete
Replies
Danny231217 August 2013 at 22:34
A hypothesis test is a more proper approach to investigate whether it deviates far from an assumption. Carry out the test is better. I can't tell you step by step here because it'll be long, so I suggest you read your reference book. Just take part is enough. =)
ReplyDelete
Replies
Unknown19 August 2013 at 10:35
can you send me a full report for my reference?my email is nasheerahalmi@ymail.com
ReplyDelete
Replies
Ethan Tiang20 August 2013 at 19:07
This comment has been removed by the author.
ReplyDelete
Replies
Ethan Tiang20 August 2013 at 19:47
Just want to comment about Q4(a), if you observe, those subjective probabilities should fall into the confidence interval obtained. This gives an interval estimate (an interval which is very likely to contain the true value of the proportion).
ReplyDelete
Replies
melissafizz20 August 2013 at 22:19
hi, I hope this gets u in time :S sorry for late questions!

for 4,b) question asks 'how do you ensure that the sample obtained is a random sample'?

You said to use 'chi-squared test'

but how do I explain it out? Arent they asking for the method of obtaining random samples... im not sure how to relate it to using 'chi-squared test' could it be for 4,c)?

If i wrote something like 'I folded ten pieces of papers each with a single different digit from 0-9..etc etc...asked random people...take from box...etc etc...' would it be accepted? O.o
Sorry for late questions,but im super confused now :'( pls help asap,thx!
ReplyDelete
Replies
melissafizz20 August 2013 at 22:57
thank you so much!
so for 4.b0 if I wrote something like 'I folded ten pieces of papers each with a single different digit from 0-9..etc etc...asked random people...take from box...etc etc...' would it be accepted?

I hope it is,i wrote all those in my papers d ><
ReplyDelete
Replies
Unknown21 August 2013 at 23:15
Just some reminder:
Subjective probability is based individual's personal judgment about whether a specific outcome is likely to occur, while objective probability is based on analysis in which each measure is based on a recorded observation, rather than a subjective estimate.

So the probability you calculated in Q2 is NOT subjective probability. It's probability obtained from the sample, or the objective probability. (In fact, the question state that “…… GIVE a reasonable probability ……”, not asking you to calculate based on the results obtained.) So it’s based on your personal judgment, not from recorded observations.

For example:
"Based on my personal judgment, since there are 720 3-different-digits-numbers, 270 2-same-digits-numbers and 10 3-same-digits-numbers from 000 to 999, I conclude that the probability of these numbers should be 0.72, 0.27 and 0.01 respectively."

And http://www.random.org/ doesn't generate random number though simple random number function. It uses atmospheric noise then perform some processing on it. (Please refer http://www.random.org/randomness/ and http://www.random.org/history/) While some online generator uses external sources like lavalamp and radioactive decay, to make the numbers generated more random compared to pseudo-number generator like the simple random number function.

So if you plan to use the online random number generator for this question, please make sure you know how they generate random number, because not all random number generators use random number function.

Just use calculator or excel and you will have no problem.

For Q3, you also need to comment on your answer, not just applying formula and obtaining results. You can talk about talk about length of interval and the estimation error. Also since the confidence intervals for 3-same-digits-numbers contain negative values, and p cannot be a negative value, maybe (yes, maybe) this indicates that:
1.Confidence interval is not a good approach or not representative in this case.
2.You can’t be 90% or 95% symmetrical confidence for 3-same-digits-number in this case.
3.The sample size (n=100) is still too small. (negative value can be eliminated by using larger sample size in this case, such as n=1000)

I wonder how you people generate random number for Q4(b). The "using other method" is fine. Number on the banknote, ISBN of books, bar code, you name it. But the "from your calculator or computer" part??? (Especially the calculator part. I don't think there's other way in the calculator other than Ran#) That's the catch. Just hope that it's a joke or oversight made by MPM. And I would like to know how you think about it,

For me, I uses the bytes (in binary form) in a radio noise file (.wav file) and perform some processing so that the number generated is random enough. Here's my example http://www.sniperkitten.tk/generate-random-number-from-radio-noise/ (written in PHP language). Just hope that it meets “calculator/computer requirement”, in case the requirement is true......
ReplyDelete
Replies
Unknown22 August 2013 at 00:13
And did your teacher tell you anything about the maximum pages for this assignment? Or is this a requirement? (I remember my teacher tell me there is such requirement in Sem 1) My teacher don't even want to touch anything about this assignment until now. (Maybe my teacher want to finish chapeter 5&6 first. But I don't like waiting)
ReplyDelete
Replies
Unknown23 August 2013 at 20:17
This comment has been removed by the author.
ReplyDelete
Replies
Unknown23 August 2013 at 20:20
UPDATE: For Q4(a), I think this question has something to do with Bayes' theorem, not just some simple tabulation and comparison, since the question tell us to "revise" instead of "compare". And revise means "reconsider and alter (something) in the light of further evidence." (Google the term "revise")

Will post an update in this forum when I know what to do:
http://cforum.cari.com.my/forum.php?mod=viewthread&tid=3138375
Just look for sniperkitten's post
ReplyDelete
Replies
Anonymous24 August 2013 at 11:49
So how to do Q2 ? Do I need to do the table for probability and state " Based on my personal judgement etc " or no need to do the table for probability and just state the statement ?
ReplyDelete
Replies
Anonymous24 August 2013 at 12:18
Anyone know how to do 4a? I don't know what the question wants.
ReplyDelete
Replies
Anonymous25 August 2013 at 14:33
For Ques 4 (a) , if your probability fall in confidence interval , you no need to revise.
ReplyDelete
Replies
summersnow25 August 2013 at 23:29
Sorry, may I know do you prepared to write introduction, methodology and conclusion? I don't know why my teacher asked us to do that and have to submit by tomorrow ?
ReplyDelete
Replies
Unknown27 August 2013 at 00:57
Ok Danny2312 I must admit, I do make things REALLY COMPLICATED because I don't realize the main purpose of this question until now. Sorry for that.

This whole assignment (maybe except for Q5) is not about investigating how "RANDOM" the number generated by using different method.(Which is what I had previously in my mind when I started doing this assignment, and making a mess out of me)

It's just about investigating how many number of 3-different, 2-same, and 3-same in the range of 000-999. Of course we know it theoretically that there are 720, 270 and 10 of these numbers respectively, but for this assignment we're assumed that we have NO KNOWLEDGE about it so we investigate these number statistically (Just like the real world where not everything can be known without some statistics because you may not have any knowledge about it)

This is why Q3 and Q4 tell us to generate RANDOM number. Because only that every number from 000-999 have an EQUAL probability of occurring, so that we can based on the sample obtained to provide a better approximation of how many 3-different, 2-same, and 3-same digits actually have from 000-999.

And this is why we're told to revise the subjective probability obtained in Q2, as the "simple experiment" may be biased in some way (e.g. you may obtain only 400 3-different-digits numbers out of 1000 sample where in fact you should have obtained around 720) and is not REPRESENTATIVE on how many 3-different, 2-same, and 3-same digits numbers have from 000-999. (e.g. we can't say that there's only about 400 3-different digits numbers from 000-999 just because we obtained 400 of it by using a method which may be biased in some way)

As for Q4(a) is far more simpler than I thought (although hard to find out). It's a part of Bayesian Modeling and Inference. Just use equation (8) in page 2 from here:(Right after the words "Using equations 3 and 4:")
http://www.cs.berkeley.edu/~jordan/courses/260-spring10/lectures/lecture5.pdf

Where E(u|x) is revised mean, u0 and sigma node (with a "0") is prior mean and prior standard deviation (obtained from Q2), x is mean obtained from the sample (in this case, from Q3), and sigma (without "0") is standard deviation of sample from Q3. For this question, just substitute the mean with proportion.

Actually this method make sense. For example if your prior (or subjective) proportion for 3-different is 0.6(Q2) and sample proportion is 0.75(Q3), then the revised proportion is about 0.68. And if you revise the new proportion 0.68 with another sample with proportion of 0.73 you will get about 0.70. So as you repeatedly revise the prior proportion again and again using random sample that is large enough (where the sample proportion is always around 0.72) the revised proportion will eventually become around 0.72, even if your prior proportion is as small as 0.1 only.

Since Q4(a) only tell us the results in Q3, so you can only do one revision only. And it doesn't matter if the test in Q4(c) doesn't fit the distribution in Q4(a). It simply implies that more revision is required. (And what will the purpose of this question if the distribution obtained will always fit?)

Sorry for my previous comments made without making sense of the whole assignment in the first place.
ReplyDelete
Replies
WeNNy Hi3W1 September 2013 at 01:15
actually i dont know how to comment about the Q3...especially how to comment about length of interval and the estimation error...pls help me~ thank u =)
ReplyDelete
Replies
michelle1 September 2013 at 17:46
do i need to use yate's correction (after merging) if my degree of freedom is 1 but my expected freq is more than ten?? thanks =)
ReplyDelete
Replies
D3rFLiVV3 September 2013 at 01:39
sorry =) wanted to ask for Q3 how to calculated the sample mean, standart deviation, and the sample size is it 100?
ReplyDelete
Replies
KLOO10 September 2013 at 18:06
gimme conclusion plss
ReplyDelete
Replies
Anonymous15 September 2013 at 12:17
i saw some sample before, in 4c, they use 0.72 x n for 3 different digit for expected frequency, how they get it?
ReplyDelete
Replies
Natasha16 September 2013 at 12:33
Is it just me, or do we not obtain a single 'three same digit' number in Q3? Is it okay if I don't have a three same digit number? Or do I have to forge one? :P
ReplyDelete
Replies
Anonymous16 September 2013 at 15:39
so question 2 we need suggest another reasonable probability for each cases?or we just straight using the probability get from experiment?
ReplyDelete
Replies
Anonymous17 September 2013 at 00:33
How if my probability obtain in Q2 does not fit some confident interval in Q3. still can use the result in Q3 in question 4(a) ?
ReplyDelete
Replies
Lin17 September 2013 at 23:44
Standard variance is aka standard error squared?
ReplyDelete
Replies
Anonymous18 September 2013 at 17:19
Mate, how do you do Conclusion? I mean its 8 marks. I know i have to conclude about every question, but that does not look like 8 marks. Thanks in advance

SAMI
ReplyDelete
Replies
Anonymous18 September 2013 at 22:02
When I do Q4, I want to explain that the subjective probability in Q2 fell into the range that I obtained in Q3. But, in actual, what is the 'range' called? I mean, when i describe like this "... we can see that the subjective probability, p obtained in Q2 lie in the range of ??? " What "???" really stand for? Can I write it as "the range of objective probability"? But I don't think it is correct... = (
A big thank you for your helping hands.
ReplyDelete
Replies
Anonymous18 September 2013 at 22:20
The "range of ??? " can write as the "symmetrical confidence interval" ? So... it will become >>> "... we can see that the subjective probabilities, p obtained in Q2 lie in the symmetrical confidence interval at 90% and 95% confidence level". Am I right?
ReplyDelete
Replies
Anonymous19 September 2013 at 22:36
Hey Danny, is it true that the higher the confidence level the wider the confidence interval?
ReplyDelete
Replies
Lee Ying22 September 2013 at 11:38
Hi!Sorry for disturbing...
There's some questions I want to ask.
Hope you can reply me.
What's the different between Digital and Non-digital random samplings?
And how do I comment on the answer for question 3?
Thanks for reading my comment.
ReplyDelete
Replies
Anonymous22 September 2013 at 12:21
Hey Danny, isn't Yates correction not in our syllabus? Do we need to use it in the assignment? Thanks in advance.
ReplyDelete
Replies

Add comment