Friday, 2 August 2013

Guide for Mathematics T Assignment, Sem 3/2013

You can't copy all the answers as this semester involve random digits that are impossible for two individuals to have completely same data, hence, this guide will only be assisting you in solving the questions. This serves as only a guide, so no detailed solution here.

Question 1
Give the definition of subjective probability. Find the definition online and describe three examples. Don't just give, describe! For example: "The subjective probability that a China player would win in the match, as deemed by the Chinese player, would be 0.95 because they have faith in him".

Question 2
Generate 30 random digits, and then compartmentalise them into three sections: three different digits, two same digits and three same digits. Tabulate your result, find the respective probability. (If you got 20 out of 30, probability is 0.67).

That is your probabilities for the three categories. Find the answers according to the random numbers you generated.

In making deduction, just say which cases have high chances of occur, and which have low chances.

Question 3
Generate 100 random digits using either excel, calculator, or a website. Here's a website to help you:
http://www.random.org/integers/

Again, compartmentalise them into three categories.

This time you'll be asked to find the 90% and 95% confidence intervals. This involves knowledge on sampling and estimation.

Assuming that the population is normally distributed, to find its 90% confidence intervals, you merely want to find the 90% regions around the centre, which means you'll be eliminating 5% from the left side of the normal distribution, and 5% from the right side of the distribution. Same theory applies to the 95% confidence intervals (find 95% of area under graph at the centre, eliminating 2.5% at the left most and 2.5% at the right most. If you don't get what I mean, search your books for diagrams). Since 95% confidence intervals has a higher confidence, the intervals you get should be larger.

This questions requires only direct application of the formula, so just do it. Show your steps, and make a summary by tabulating them in a table would be helpful.

Question 4
a) Not sure what to do about this part, but we simply write the subjective probability obtained in Q3 in a table.

b)
Generate another 100 random digits, and find the subjective probabilities for each category. Chi-squared Tests is performed to investigate whether to accept or reject the null hypothesis. Your null hypothesis is "the distribution fits the distribution in (a)" and your alternative hypothesis is "the distribution does not fit the distribution in (a)".

You would be expected to have knowledge on Chi-Squared Tests. Just directly apply the formula and see whether the chi-squared calculated exceeds the critical value. If it does, the null hypothesis is rejected, and you can conclude that the distribution obtained in (b) does not fit the distribution obtained in (a).

Note that for chi-squared calculation to be valid, each category must have frequency of at least 5. If your "three same digits" category has less than 5 values, you need to merge it with another category of your choice to form a category of frequency exceeding 5.

Thus, if merging is done, your degree of freedom would be equal to 1 (2 - 1 = 1). You'll have to observe here whether your expected frequency of all categories are bigger or smaller than 10. If your expected frequency is less than 10, then Yates' Correction needs to be employed. But it is unlikely for your expected frequency to be less than 10 now that you've merged two categories. However, if this situation does arise, use this equation:


If all your categories have frequency exceeding 5 and no merging is done, meaning your degree of freedom is 2, and if your degree of freedom is one but all your expected frequency exceeds 10, then ignore this correction and just use the normal chi-squared formula.

Note:
Use Yates' Correction Only When
1) degree of freedom = 1
2) expected frequency < 10
Both conditions need to be met for Yates' Correction to be employed. 

Chi-squared tests is chosen here because the object to be investigated is categorical which does not, and could not, presume a normal distribution graph. (behaviours of categories)

Question 5
(a) Use the subjective probability you obtained earlier.

(b) Generate 64 digits using your calculator. Compartmentalise them again and find the probabilities. Then use hypothesis testing (normal approximation) to check whether to accept or reject the null hypothesis. Your null hypothesis is "the probability that a number has three different digits is more than the probability you have suggested in (a)".

Hypothesis testing of normal approximation is chosen because the object to be investigated here is numerical that could resemble a normal distribution graph. (The probability of three different digits).


*The guide above is only a guide and is not an official answer by MPM. 

76 comments:

  1. Thank you for your teaching. I love it. It helps a lot.
    There's some confusion which I cannot understand well from the assignment's question.

    For the Q4c, the category are divide into 3 diff. , 2 same and 3 same digits.

    During the calculation for chi square test, I put my
    X= occurrence pf the digit,
    n= 3 (three digit)
    p= digit occurrence
    q= digit absence
    P(occurrence)= 0.10
    X~B(3, 0.1)

    When come to the binomial calculation P(X=X)= nCr(0.1)r(0.9)n-r , I notice the r need to have 0, 1, 2 and 3.

    For the zero occurrence of the digit, do I create another category for it?

    As the for % of significant level, do I assume it is 5% or 1% ?
    Because this will determine the acceptance or rejection of the distribution.

    Thanks

    ReplyDelete
  2. Umm....Chi-squared tests do not involve Binomial calculation, at least for this question. You simply calculate the number of digits that contain 3 same digits, and then find the probability that you'll find it in the set of digits you obtained. As far as I know, binomial calculation is not involved.

    Significance level used, as I observed, is normally 5% or 1%. I just learnt chi-squared few days ago so I'm not very familiar with this topic, but the usual level of significance adopted is 5% or 1%, so I guess we pick them for the test. There's no right or wrong for your assignment as long as your steps are correct. If your null hypothesis is rejected, then let it be. It simply means your sample is just very random and unreliable. =)

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. Does it mean that my observation,Oi is from the 50 digits and the Ei is from my 100 digits?

    If my,
    P( 3 diff digit ) = 0.82, f= 41 for 50 DIGITS
    P( 3 diff digit ) = 0.69, f= 69 for 100 DIGITS

    For chi squared test on category 3 diff digits

    METHOD 1:-
    Oi = 41
    Ei = nP(X=x) , where n = 50 and P(X=x) = 0.69

    METHOD 2:-
    Oi = 41
    Ei = 69

    which method is correct? sorry if I'm a bit slow on catching the idea of you message.

    ReplyDelete
  5. Refer to Q4(a)
    If you subjective probability is as follow:
    three same digits: 0.05
    three different digits: 0.70
    two same digits: 0.25
    then your expected frequency, Oi, for three same digits is 5, three different digits is 70, and two same digits is 25

    You simply assume there are 100 numbers, and thus you just take the probability x frequency to get its frequency of occurrence which is the expected frequency.

    Your observed frequency will follow the probabilities you obtained from the numbers you generated in 4(b). If you get 3 three same digits number for this random digits, then your observed frequency is 3. And thus (Oi-Ei) = (3-5)

    ReplyDelete
  6. Hello~ I have a question about the assignment and I hope that you can answer me please. About the question 5(b), we must have to use the hypothesis testing of normal distribution? I'm not sure about how to use this method. Do we have to use the 5% significance level? I simply thought that I just need to generate all the numbers, and calculate the probability of getting 3 different digits, that is eg. 50/64 which is higher than the probability suggested in (a).
    Thanks.

    ReplyDelete
  7. A hypothesis test is a more proper approach to investigate whether it deviates far from an assumption. Carry out the test is better. I can't tell you step by step here because it'll be long, so I suggest you read your reference book. Just take part is enough. =)

    ReplyDelete
    Replies
    1. Okay by the way, do we need to involve the 5% significance level in the solution of that question? Is it something about the one-talied test or two-tailed test?

      Delete
    2. Could you please specify which question are you referring to?

      Delete
    3. Oh I am talking about the question 5b.

      Delete
  8. can you send me a full report for my reference?my email is nasheerahalmi@ymail.com

    ReplyDelete
    Replies
    1. Sorry, my school teacher requested us to submit handwritten reports. Therefore, besides this guide here, I have no other soft copies relevant to it.

      Delete
    2. This comment has been removed by the author.

      Delete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Just want to comment about Q4(a), if you observe, those subjective probabilities should fall into the confidence interval obtained. This gives an interval estimate (an interval which is very likely to contain the true value of the proportion).

    ReplyDelete
  11. hi, I hope this gets u in time :S sorry for late questions!

    for 4,b) question asks 'how do you ensure that the sample obtained is a random sample'?

    You said to use 'chi-squared test'

    but how do I explain it out? Arent they asking for the method of obtaining random samples... im not sure how to relate it to using 'chi-squared test' could it be for 4,c)?

    If i wrote something like 'I folded ten pieces of papers each with a single different digit from 0-9..etc etc...asked random people...take from box...etc etc...' would it be accepted? O.o
    Sorry for late questions,but im super confused now :'( pls help asap,thx!

    ReplyDelete
    Replies
    1. No, no...the reason I explained why chi-squared tests is chosen because certain people asked me why this test is chosen and not hypothesis testing or other test statistics. It is not to answer that the sample is random.

      As for how to explain it's random, I guess it depends on you. =)

      Delete
  12. thank you so much!
    so for 4.b0 if I wrote something like 'I folded ten pieces of papers each with a single different digit from 0-9..etc etc...asked random people...take from box...etc etc...' would it be accepted?

    I hope it is,i wrote all those in my papers d ><

    ReplyDelete
    Replies
    1. Might be. Lol I'm not a teacher, just a student like you except that my teacher let us refer her answer for a brief moment and thought I could share it, so I can't tell for sure whether it's right or wrong. Good luck anyway!

      Delete
  13. Just some reminder:
    Subjective probability is based individual's personal judgment about whether a specific outcome is likely to occur, while objective probability is based on analysis in which each measure is based on a recorded observation, rather than a subjective estimate.

    So the probability you calculated in Q2 is NOT subjective probability. It's probability obtained from the sample, or the objective probability. (In fact, the question state that “…… GIVE a reasonable probability ……”, not asking you to calculate based on the results obtained.) So it’s based on your personal judgment, not from recorded observations.

    For example:
    "Based on my personal judgment, since there are 720 3-different-digits-numbers, 270 2-same-digits-numbers and 10 3-same-digits-numbers from 000 to 999, I conclude that the probability of these numbers should be 0.72, 0.27 and 0.01 respectively."

    And http://www.random.org/ doesn't generate random number though simple random number function. It uses atmospheric noise then perform some processing on it. (Please refer http://www.random.org/randomness/ and http://www.random.org/history/) While some online generator uses external sources like lavalamp and radioactive decay, to make the numbers generated more random compared to pseudo-number generator like the simple random number function.

    So if you plan to use the online random number generator for this question, please make sure you know how they generate random number, because not all random number generators use random number function.

    Just use calculator or excel and you will have no problem.

    For Q3, you also need to comment on your answer, not just applying formula and obtaining results. You can talk about talk about length of interval and the estimation error. Also since the confidence intervals for 3-same-digits-numbers contain negative values, and p cannot be a negative value, maybe (yes, maybe) this indicates that:
    1.Confidence interval is not a good approach or not representative in this case.
    2.You can’t be 90% or 95% symmetrical confidence for 3-same-digits-number in this case.
    3.The sample size (n=100) is still too small. (negative value can be eliminated by using larger sample size in this case, such as n=1000)


    I wonder how you people generate random number for Q4(b). The "using other method" is fine. Number on the banknote, ISBN of books, bar code, you name it. But the "from your calculator or computer" part??? (Especially the calculator part. I don't think there's other way in the calculator other than Ran#) That's the catch. Just hope that it's a joke or oversight made by MPM. And I would like to know how you think about it,

    For me, I uses the bytes (in binary form) in a radio noise file (.wav file) and perform some processing so that the number generated is random enough. Here's my example http://www.sniperkitten.tk/generate-random-number-from-radio-noise/ (written in PHP language). Just hope that it meets “calculator/computer requirement”, in case the requirement is true......

    ReplyDelete
  14. And did your teacher tell you anything about the maximum pages for this assignment? Or is this a requirement? (I remember my teacher tell me there is such requirement in Sem 1) My teacher don't even want to touch anything about this assignment until now. (Maybe my teacher want to finish chapeter 5&6 first. But I don't like waiting)

    ReplyDelete
    Replies
    1. No actual limits, but best keep it about 10 pages.
      MPM does not provide answers so all answers and how you should do it depends on your teacher as they is no standardised format. Except perhaps a state one. (There's a head for assignment in a state, like the one in my school)

      Delete
    2. I see. Thanks for the info.

      So how about my previous comment about subjective probabilities you calculated in Q2? Aren't they the objective probabilities because you obtained them from the actual data?

      Delete
    3. Ya, it's a mistake. Will be corrected. Thanks for notifying!

      Delete
  15. This comment has been removed by the author.

    ReplyDelete
  16. UPDATE: For Q4(a), I think this question has something to do with Bayes' theorem, not just some simple tabulation and comparison, since the question tell us to "revise" instead of "compare". And revise means "reconsider and alter (something) in the light of further evidence." (Google the term "revise")

    Will post an update in this forum when I know what to do:
    http://cforum.cari.com.my/forum.php?mod=viewthread&tid=3138375
    Just look for sniperkitten's post

    ReplyDelete
    Replies
    1. The sample size for Q3 is bigger than sample size in Q2, making the result more accurate, so the subjective probabilities obtained in Q3 is more accurate than that obtained in Q2, and hence you could just use it for the next section. Bayes' theorem is not involved.

      But like I said, the answers are prepared by your teacher so it's possible your teacher pr you have another approach to solve the problem. But it's actually a simple one, just use the one obtained in Q3 to replace the one in Q2. The term 'revise', as defined, is "alter something in light of further evidence", and the evidence here is the larger sample size in Q3 which makes the subjective probability more accurate.

      Best advised to follow your teacher.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. The problem with Q2 that I (and most of my friends) overlooked is that the question just tell us to "...... generate thirty 3-digit-NUMBERS......" (NUMBERS only, not necessary random number), while only until Q3 tell us to generate "......one hundred 3-digit RANDOM numbers......"

      The "simple experiment" in Q2 is not necessary an approach to generate random number (so the probability of getting a 3-different-digits, for example, can be as low as 0.1), while the "using random number function" is obviously one of many method to generate random number.

      So you can't replace the subjective probability from Q2 with the objective probability from Q3. They are two different methods that do DIFFERENT thing, not different methods that try to do the same thing.

      And if you search for methods & ways to revise a subjective (or prior) probability you will come up with a lot of results that use Bayes' theorem & inference.

      Since it tell us that "... using results obtained in step 3...", you can use the confidence interval for P(E|H) given that P(H|E) is the revise probability.

      Delete
    4. It's actually well understood that "to generate 30 3-digit number" means to randomly come up with 30 3-digit number by, say, taking last 3 digits of IC numer.. You can't possibly come out with organised numbers that specifically restrict the frequency of each categories, unless you do it deliberately, which I don't see what is the purpose for in this assignment. And this assignment is meant to investigate random numbers generated from different methods (simple experiment in Q2, computer in Q3, other method in Q4, and a function in Q5)as by doing so large variation can be obtained and an investigation for "random" numbers can be more clear.

      It's really a simple assignment. I'll advise you to don't complicate it. But like I said, stick to your method if you believe it is correct as the guide I provided is only an answer done by me, and provided by my teacher. And follow your teacher as the marking scheme will be set by your teacher (and hence vary between schools).

      Delete
  17. So how to do Q2 ? Do I need to do the table for probability and state " Based on my personal judgement etc " or no need to do the table for probability and just state the statement ?

    ReplyDelete
    Replies
    1. Follow your teacher for those things as different schools have different marking scheme.

      Delete
  18. Anyone know how to do 4a? I don't know what the question wants.

    ReplyDelete
  19. For Ques 4 (a) , if your probability fall in confidence interval , you no need to revise.

    ReplyDelete
  20. Sorry, may I know do you prepared to write introduction, methodology and conclusion? I don't know why my teacher asked us to do that and have to submit by tomorrow ?

    ReplyDelete
    Replies
    1. Intro: Briefly describe what is subjective probability, chi-squared test and hypothesis testing. Get info on internet.
      Method: Write your steps in word form. Do it after you've done your results.
      Conclusion: Brief conclusion of all questions, and an application would be helpful.

      Delete
  21. Ok Danny2312 I must admit, I do make things REALLY COMPLICATED because I don't realize the main purpose of this question until now. Sorry for that.

    This whole assignment (maybe except for Q5) is not about investigating how "RANDOM" the number generated by using different method.(Which is what I had previously in my mind when I started doing this assignment, and making a mess out of me)

    It's just about investigating how many number of 3-different, 2-same, and 3-same in the range of 000-999. Of course we know it theoretically that there are 720, 270 and 10 of these numbers respectively, but for this assignment we're assumed that we have NO KNOWLEDGE about it so we investigate these number statistically (Just like the real world where not everything can be known without some statistics because you may not have any knowledge about it)

    This is why Q3 and Q4 tell us to generate RANDOM number. Because only that every number from 000-999 have an EQUAL probability of occurring, so that we can based on the sample obtained to provide a better approximation of how many 3-different, 2-same, and 3-same digits actually have from 000-999.

    And this is why we're told to revise the subjective probability obtained in Q2, as the "simple experiment" may be biased in some way (e.g. you may obtain only 400 3-different-digits numbers out of 1000 sample where in fact you should have obtained around 720) and is not REPRESENTATIVE on how many 3-different, 2-same, and 3-same digits numbers have from 000-999. (e.g. we can't say that there's only about 400 3-different digits numbers from 000-999 just because we obtained 400 of it by using a method which may be biased in some way)

    As for Q4(a) is far more simpler than I thought (although hard to find out). It's a part of Bayesian Modeling and Inference. Just use equation (8) in page 2 from here:(Right after the words "Using equations 3 and 4:")
    http://www.cs.berkeley.edu/~jordan/courses/260-spring10/lectures/lecture5.pdf

    Where E(u|x) is revised mean, u0 and sigma node (with a "0") is prior mean and prior standard deviation (obtained from Q2), x is mean obtained from the sample (in this case, from Q3), and sigma (without "0") is standard deviation of sample from Q3. For this question, just substitute the mean with proportion.

    Actually this method make sense. For example if your prior (or subjective) proportion for 3-different is 0.6(Q2) and sample proportion is 0.75(Q3), then the revised proportion is about 0.68. And if you revise the new proportion 0.68 with another sample with proportion of 0.73 you will get about 0.70. So as you repeatedly revise the prior proportion again and again using random sample that is large enough (where the sample proportion is always around 0.72) the revised proportion will eventually become around 0.72, even if your prior proportion is as small as 0.1 only.

    Since Q4(a) only tell us the results in Q3, so you can only do one revision only. And it doesn't matter if the test in Q4(c) doesn't fit the distribution in Q4(a). It simply implies that more revision is required. (And what will the purpose of this question if the distribution obtained will always fit?)

    Sorry for my previous comments made without making sense of the whole assignment in the first place.

    ReplyDelete
    Replies
    1. It's okay. Not need to apologise. We tend to complicate things when we are nervous. Just make sure we don't get it overruled us. =)

      Delete
  22. actually i dont know how to comment about the Q3...especially how to comment about length of interval and the estimation error...pls help me~ thank u =)

    ReplyDelete
    Replies
    1. I don't know what to comment either. But I did comment whether the probabilities I obtained in Q2 fall under the confidence intervals.

      Delete
    2. Tat mean if the probability I obtained in Q2 fall under the confidence interval,in 4(a),i dont need to revise it?

      Delete
    3. And for 4(c), expected frequency no, should i take the probability that i generated in Q2 or in Q3 if the probability I obtained in Q2 fall under the confidence interval in Q3...pls reply asap because i try to pass up 2moro~sorry for keep questioning u question...paiseh

      Delete
    4. should i generate a list of 3 digit no. in Q5(a) or just write the suggested probability only?

      Delete
    5. Sorry I had been away for a while. How's it? I assume you have handed in your assignment.

      Delete
  23. do i need to use yate's correction (after merging) if my degree of freedom is 1 but my expected freq is more than ten?? thanks =)

    ReplyDelete
  24. sorry =) wanted to ask for Q3 how to calculated the sample mean, standart deviation, and the sample size is it 100?

    ReplyDelete
    Replies
    1. Sample mean is pq, while standard variance is (pq)/n
      where p = probability that it occurs
      q = probability it does not occur
      n = sample size

      your sample size is 300, and your p is the probability you obtained while q = 1 - p.

      Delete
  25. gimme conclusion plss

    ReplyDelete
  26. i saw some sample before, in 4c, they use 0.72 x n for 3 different digit for expected frequency, how they get it?

    ReplyDelete
    Replies
    1. your probability for obtaining 3 different digits is 0.72, and the 'n' is probability the sample size, which is 100 or 50, depending on which total frequency you wanna employ.

      Delete
  27. Is it just me, or do we not obtain a single 'three same digit' number in Q3? Is it okay if I don't have a three same digit number? Or do I have to forge one? :P

    ReplyDelete
    Replies
    1. It's normal. Just say the probability is zero. =)

      Delete
  28. so question 2 we need suggest another reasonable probability for each cases?or we just straight using the probability get from experiment?

    ReplyDelete
  29. How if my probability obtain in Q2 does not fit some confident interval in Q3. still can use the result in Q3 in question 4(a) ?

    ReplyDelete
    Replies
    1. If this situation arises, it means your subjective probabilities for Q2 is rejected, and thus you should choose the ones you got in Q3.

      Delete
  30. Standard variance is aka standard error squared?

    ReplyDelete
  31. Mate, how do you do Conclusion? I mean its 8 marks. I know i have to conclude about every question, but that does not look like 8 marks. Thanks in advance

    SAMI

    ReplyDelete
    Replies
    1. Well, that's how I did it. Lol I don't really care how the marks are given.

      Delete
  32. When I do Q4, I want to explain that the subjective probability in Q2 fell into the range that I obtained in Q3. But, in actual, what is the 'range' called? I mean, when i describe like this "... we can see that the subjective probability, p obtained in Q2 lie in the range of ??? " What "???" really stand for? Can I write it as "the range of objective probability"? But I don't think it is correct... = (
    A big thank you for your helping hands.

    ReplyDelete
  33. The "range of ??? " can write as the "symmetrical confidence interval" ? So... it will become >>> "... we can see that the subjective probabilities, p obtained in Q2 lie in the symmetrical confidence interval at 90% and 95% confidence level". Am I right?

    ReplyDelete
  34. Hey Danny, is it true that the higher the confidence level the wider the confidence interval?

    ReplyDelete
  35. Hi!Sorry for disturbing...
    There's some questions I want to ask.
    Hope you can reply me.
    What's the different between Digital and Non-digital random samplings?
    And how do I comment on the answer for question 3?
    Thanks for reading my comment.

    ReplyDelete
    Replies
    1. I don't know what's the difference between the random sampling, sorry.
      As for the comments, I commented on which has higher probability, and then which confidence interval is largest.

      Delete
  36. Hey Danny, isn't Yates correction not in our syllabus? Do we need to use it in the assignment? Thanks in advance.

    ReplyDelete
    Replies
    1. It's a mathematical rule, not a formula. Just because it isn't in our syllabus doesn't mean it does not need to be applied if the situation that requires it arise.

      But I believe most of the data we obtained do not require Yates' correction, so I don't thin you'll need to apply it. Check my guide to determine whether you need it.

      Delete
    2. Omg! I din do it. arghhh... anyways thanks... really appreciate you replying.

      Delete