Home – Discovering Statistics

RANDOM NUMBER GENERATORS
What are random numbers?

Imagine yourself picking up a card from a well shuffled full deck of cards. What could the card you picked be? The jack of spades, the king of diamonds, or the queen of hearts?? Well it could be the ace of clubs too or it could be any of the other available cards in the deck.

In terms of probability theory, you can definitely say that picking up a card from a well shuffled full deck of cards is a ‘random experiment’. It is an experiment because, when one picks up a card there is an outcome, a result of the effort. And a random one cause you definitely know all the possible cards that may be picked.

Also in this particular random experiment of picking a card from a well shuffled full deck of cards, no card is preferred over the other. In simple words, the probability of picking up any card is the same, in statistical sense, the event of picking a particular card and the event of picking up another card will be equally likely. This is what we call a random phenomenon. So, picking up a card from a well shuffled full deck of cards could be interpreted as picking a card randomly from a full deck of cards.

Now imagine yourself selecting a number from the set (0,1,2,3,4,5,6,7,8,9) randomly. It means that you selecting 0 or 1 or 2 or 3 or …. or 9 are equally likely events. Thus, we are selecting a random number from the set.

How to generate random numbers?
A random number from the set of numbers can be generated by using the following methods:
- Lottery Method: Suppose there are n numbers in the set. One can then take n similar balls such that each ball is given a unique number from the set, and put it in an urn. Shuffle the balls , and then start picking up the balls one by one with replacement and note down the number on each of the balls picked. The numbers noted will be the random numbers.
- Roullette Wheel: One can also take a roulette wheel and divide the wheel into n equal pieces and writing the numbers(uniquely from the set) on each of the areas and spin the wheel, and note down the number. Here too the numbers noted will be random numbers.
- Random Number Table: The above two methods are physical and it always take a considerate amount of time to draw random numbers that way. So instead, one can use a random number table, a table where random numbers are stacked up. However, since the random numbers drawn by one may be duplicated by another quite easily, there has always been an uncertainty about randomness in this method, we quite oftenly call the random numbers drawn by this method to be pseudo- random numbers.
Drawing Random Numbers using algorithms (for computers)

Since a computer is a deterministic device, it might seem impossible that it could be used to generate random numbers. The numbers generated are algorithmically computed and are quite deterministic. However, they appear to be random and must pass stringent tests designed to ensure that they can provide the same results that truly random numbers (such as the first two methods above) would be produced.

Requirements of a Random Number Generator:
1. It should be fast.
2. It should be repeatable.
3. It should be amenable to analysis.
4. It should have a long period.
5. It should be apparently random.
Some Random Number Generator (RNF) Methods

MIDDLE SQUARE TECHNIQUE

LINEAR CONGRUENTIAL RNG
21st Jun 2022

UG NOTES
SIMULATION
Simulation is a numerical technique for conducting experiments on a digital computer, which involves certain types of mathematical and logical models that describe the behavior of business or economic system or some component thereof over extended period of time.

Simulation deals with both abstract and physical models. Some simulation with physical and abstract models might involve real people. Two types of simulation involving real people are-
1. Operational gaming
2. Man-machine simulator
Merits of Simulation
1. In cases where obtaining data is either impossible or very expensive, simulation is used.
2. The observed system may be too complex that it cannot be described by a mathematical model.
3. Even though a mathematical model may be formed to describe the system, it may not always be possible to find a straight forward analytical solution.
4. It may either be impossible or very costly to perform experiments to validate a mathematical model describing the system.
Demerits of Simulation
1. Simulation is indeed invaluable and very versatile tool in those problems where analytical techniques are inadequate. Although being an impressive technique, it provides only statistical estimates rather than exact results and it only compare alternatives rather than generating the optimal one.
2. Simulation is also a slow and costly way to study a problem. It usually requires a large amount of time and great expense for analysis and programming.
3. Finally simulation deals only numerical data about the performance of the system and sensitivity analysis of the model parameters is very expensive.
20th Jun 2022

UG NOTES

Monte Carlo, Simulation
Monte Carlo Methods
A Monte Carlo method is a computational / numerical method that uses random numbers to compute / estimate a quantity of interest. The quantity of interests may be the mean of a random variable , functions of several means , distributions of random variables or high dimensional integrals.

Basically, Monte Carlo methods may be grouped into two types:

The direct/simple/classical Monte Carlo methods involves generating identical and independent sequences of random samples. And the other, in a sense, involves generating a sequence of random samples, which are not independent, and is the Markov Chain Monte Carlo methods.

Monte Carlo Integration
Let [katex] f(x) [/katex] be a function of x and suppose we are interested in computation of the integral:
I= \int_0^1 f(x) dx
We can write the integral as,
I=\int_0^1 f(x) p(x) dx =E(f(X))\\ \textit{; where p(x) is the pdf of a r.v. X $\sim$ Unif(0,1)}
Now suppose that [katex] x_1,x_2,\dots,x_n [/katex] are independent random samples drawn from Uniform(0,1), then by the law of large number we have,
\frac{1}{n} \sum_{i=1}^n f(x_i) \rightarrow E(f(X))
Thus an estimator of I may be:
\hat{I}=\frac{1}{n} \sum_{i=1}^n f(x_i)
On a more general note, if [katex] a < b < \infty [/katex] then,
```
I= \int_a^b f(x) dx \\
\textit{Taking $y=\frac{x-a}{b-a}$}\\
I= \int_0^1 (b-a) f \left(\frac{a+(b-a)y}{b} \right) dy  \\
=  \int_0^1 h(y) dy \quad dy \\ \text{;where } h(y)=f \left(\frac{a+(b-a)y}{b} \right)  \\
= E\left[ h(Y) | Y \sim Unif(0,1) \right]
```
And when [katex] b=\infty [/katex],
```
I= \int_a^\infty f(x) dx \\ 
\text{Taking $y=\frac{1}{x+1}$} \\
I= - \int_1^0  f\left( \frac{1}{y} -1 \right) \frac{dy}{y^2} \\
= \int_0^1  h(y) dy \\
\text{; where } h(y) =  f\left( \frac{1}{y} -1 \right)/y^2 \\
=E(h(Y) | Y \sim Unif(0,1))
```
Some advantages of using Monte Carlo Methods:
- Estimates could be obtained without any hardcore assumptions.
- The rate of convergence of a Monte Carlo estimator may not be in par with that obtained by other methods. However, whenever one is dealing with high dimensional integrals, Monte Carlo methods always provides estimators that ultimately converges to the exact value.
18th Jun 2022

UG NOTES

monte-carlo, statistics
Operations Research: The ORIGIN

The origin of Operations Research can be traced back to the World War-II
period. During the time of war, the military management in England had called
upon a team of scientists to study the strategic and tactical problems related to
air and land defence of the country. Since, the military resources were limited,
it was upto these scientists to decide upon the most effective utilization of them,
eg. the efficient ocean transport, effective bombing, etc.
During WWII, the Military Commands of U.K. and U.S.A. engaged several
inter-disciplinary teams of scientists to undertake scientific research into strate-
gic and tactical military operations. Their missions was to formulate specific
proposals and plans for aiding the Military Commands to arrive at the deci-
sions on optimal utilization of scarce military resources and efforts, and also to
implement the decisions effectively. The OR team were not actually engaged
in military operations and in fighting the war. But, they were only advisors
and significantly instrumental in winning the war to the extent that the sci-
entific and systematic approaches involved in OR provided a good intellectual
support to the strategic initiatives of the military commands. Hence, OR can
be associated with ”an art of winning the war without actually fighting
it”.

As the name implies, ’Operations Research’ was apparently invented because
the team was dealing with research on (military) operations. The work of this
team of scientists was named as Operations Research in England.
The encouraging results obtained by the British OR teams also motivated
the United States military management to start with similar activities. Suc-
cessful applications of the U.S¿ teams included the invention of new fight pat-
terns, planning sea mining and effectively utilization of electronic equipment.
The work of OR team was given various names in the United States such as
: Operational Analysis, Operations Evaluation, Operations Research, Systems
Analysis, Systems Evaluations, Systems Research, Systems Analysis, Systems
Evaluation, Systems Research and Management Science. The name Operations
Research, though is most widely used.
Though OR initially started with the intention to utilize limited military
resources efficiently during war, after the end of WWII, OR got attention of
Industrial Managers who were eagerly looking for a solution to their complex
executive-type problems. The most common was : ”what methods should be
adopted so that the total cost is minimum or total profits maximum?”. The
first mathematical technique in this field called the Simplex Method of Lin-
ear Programming was developed in 1947 by American mathematician George
B. Dantzig. Since then, new techniques and applications have been developed
through the efforts and co-operation of interested individuals in academic insti-
tutions and industry.
Today, the impact of OR can be felt in many areas. A large number of
management consulting firms are currently engaged in OR activities. Apart
from military and business applications, the OR activities include transportation
system, libraries, hospitals, city planning, financial institutions etc. Many of
the Indian industries making use of OR activity are: Delhi Cloth Mills, Indian
Railways, Indian Airlines, etc.
While making use of the techniques of OR, a mathematical model of the
problem is formulated. This model is actually a simplified representation of
the problems in which only the most important feature is considered for rea-
sons of simplicity. Then an optimal or most favourable solution is found. Since
the model is an idealized instead of exact representation of real problem, the
optimal solution thus obtained may not provide to be the best solution to the ac-
tual problem. Although, extremely complex and highly accurate mathematical
models may be formed, these are often omitted because they may not be easily
solvable. So from both cost-minimising and mathematical simplicity point of
view, it seems beneficial to develop a less accurate but simple model, and to
find a sequence of solutions consisting of a series of increasingly better approx-
imations to the actual course of action. Thus, the apparent weaknesses in the
initial solution are used to suggest improvements in the model, its input-data,
and the solution procedure. A new solution is thus obtained and the process is
repeated until the further improvements in the succeeding solutions become so
small that it does not seem economical to make further.
If the model is carefully formulated and tested, the resulting solution should
reach to be good approximation to the ideal course of action for the real problem.

Although we may not get the best answers but definitely we are able to find
the bad answers where worse exist. Thus, operations research techniques are
always able to save us from worse situations of practical life

16th Jun 2022

Operations Research

Operations Research, UG NOTES
JAM 2022 [ 51-60 ]
[Q No. 51-60]

Let [katex] X_1,X_2,\dots,X_9 [/katex] be a random sample from a [katex] N(\mu,\sigma^2) [/katex] distribution, where [katex] \mu \in \mathbb{R} \text{ and } \sigma>0 [/katex] are unknown. Let the observed values of [katex] \bar{X} = \frac{1}{9} \sum_{i=1}^9 \text{and} S^2 = \frac{1}{8} \sum_{i=1}^{9} ( X_i -\bar{X} )^2 [/katex] be 9.8 and 1.44, respectively. If the likelihood ratio test is used to test the hypothesis [katex] H_0: \mu=8.8 \text{ against } H_1:\mu>8.8, [/katex] then the p-value of the test equals __________________ (round off to 3 decimals)

We know that,
```
Y=\frac{(\bar{X}-\mu)}{\sigma} \sim N(0,1) \\
and \\
Z^2=\frac{8S^2}{\sigma^2} \sim \chi^2_{(8)} \text{independently}
```
Thus, the test statistic that should to be used for testing the null vs alternative would be:
```
T = \frac{Y}{Z/\sqrt{8}} \sim t_{(8)} 
```
Judging from the alternative the rule of rejection of Null Hypothesis is: Reject H_0 if the tabulated value is larger than the tabulated value, t1 (say).

Now, the p-value of the test is given as:
```
Pr [T > T_{tab} / H_0 \text{ is true} ] = Pr \left[ T > \frac{ (9.8 - 8.8)}{\sqrt{8 (1.44)} } \right] \\= Pr \left[T > \frac{1}{1.2 (2\sqrt{2})} \right] = 0.0185
```
Pages: 1 2 3 4 5 6 7 8 9 10
16th Jun 2022

JAM

JAM, JAM 2022

HYPERGEOMETRIC DISTRIBUTION

Suppose an urn contains N coloured balls, out of which M are blue and the rest are red in color. Suppose we draw a sample of ‘n’ number of balls from the urn at random and without replacements. Then if X a random variable denoting the number of blue balls drawn, then the probability that there will be x number of blue balls drawn is given by:

p(x ; N,M) = Pr[X=x] \\ \hskip{1.8cm}= \frac{\binom M x \binom {N-M}{n-x}}{\binom N n} \\ \hskip{3cm}; x=0,1,\dots,M

EXPECTATION

E(X)= \sum_{x=0}^{M} x p(x;n,N,M) = \sum_{x=1}^{M} x \frac{\binom M x \binom {N-M}{n-x}}{\binom N n} \\ = \sum_{x=1}^{M} x \frac{n!}{N!(N-n)!)} \frac{M!}{x!(M-x)!} \frac{(N-M)!}{(n-x)!(N-M-n+x)!} \\ = \sum_{x=1}^{M} \frac{n!}{N!(N-n)!)} \frac{M(M-1)!}{(x-1)!(M-x)!} \frac{(N-M)!}{(n-x)!(N-M-n+x)!} \\ = \frac{M}{\binom N n} \sum_{x=1}^{M} \binom {M-1}{x-1} \binom {(N-1)-(M-1)}{(n-1)-(x-1)} \\ = \frac{M}{\binom N n} \binom {N-1}{n-1} \sum_{x=1}^{M} p(x-1 ; n=n-1 ,N=N-1 , M =M-1) \\ = \frac{M}{\binom N n} \binom {N-1}{n-1} \sum_{x=0}^{M-1} p(x ; n=n-1 ,N=N-1 , M =M-1) \\ = \frac{M n! (N-n)! (N-1)!}{(n-1)!(N-n)!N!} = n \frac{M}{N}

FACTORIAL MOMENTS

For any integer, s=0,1,…,M, we have,

\mu_{[s]}=E(X(X-1)\dots(X-s+1)) \\= \sum_{x=0}^{M} \frac{x!}{(x-s)!}p(x; n,N,M) \\ = \sum_{x=s}^{M} \frac{x!}{(x-s)!} \frac{\binom M x \binom {N-M}{n-x}}{\binom N n} \\ = \sum_{x=s}^{M} \frac{x!}{(x-s)!} \frac{\binom M x \binom {N-M}{n-x}}{\binom N n} \\ = \sum_{x=s}^{M} \frac{x!}{(x-s)!} \frac{M!}{x!(M-x)!} \frac{ \binom {N-M}{n-x}}{\binom N n} \\ = \frac{M!}{(M-s)!} \sum_{x=s}^{M} \frac{(M-s)!}{(x-s)!(M-x)!} \frac{ \binom {N-M}{n-x}}{\binom N n} \\ = \frac{M!}{(M-s)!} \frac{1}{\binom N n} \sum_{x=s}^{M} \binom {M-s}{x-s} { \binom {N-M}{n-x}} \\ = \frac{M!}{(M-s)!} \frac{1}{\binom N n} \sum_{x=0}^{M-s} \binom {M-s}{x} { \binom {(N-s)-(M-s)}{n-x-s}} \\ = \frac{M!}{(M-s)!} \frac{\binom {N-s}{n-s}}{\binom N n} \sum_{x=0}^{M-s} p(x ; n'=n-s,N'=N-s,M'=M-s) \\ = \frac{M!}{(M-s)!} \frac{(N-s)! n! (N-n)!}{N! (n-s)! (N-n)!)} \\ = \frac{M!}{(M-s)!} \frac{(N-s)!}{N!} \frac{n!}{(n-s)!} \\

VARIANCE:

E(X(X-1)) = \frac{M!}{(M-2)!} \frac{(N-2)!}{N!} \frac{n!}{(n-2)!} = \frac{n(n-1)M(M-1)}{N(N-1)} \\ ------------------------------ \\ E(X^2) = E(X(X-1)) +E(X) \\ = \frac{n(n-1)M(M-1)}{N(N-1)} + \frac{nM}{N} \\ = \frac{nM}{N(N-1)} \left[ (n-1)(M-1) + (N-1) \right] \\ ------------------------------ \\ V(X) = \frac{nM}{N(N-1)} \left[ (n-1)(M-1) + (N-1) \right] - \frac{n^2M^2}{N^2} \\ = \frac{nM}{N^2 (N-1)} \left[ N(n-1)(M-1) + N(N-1) - nM(N-1)) \right] \\ = \frac{nM}{N^2 (N-1)} \left[ nNM-nN-NM+N + N^2-N - nMN+nM \right] \\ = \frac{nM}{N^2 (N-1)} \left[ -nN-NM + N^2 +nM \right] \\ = \frac{nM}{N^2 (N-1)} (N-n)(N-M) \\ = n \frac{M}{N} \frac{N-M}{N} \frac{N-n}{N-1}

THIRD AND FOURTH ORDER MOMENTS:

Provided that the specified moment exists, it is straightforward (though tedious) to show via the factorial moments that…
Univariate Discrete Distributions – Johnson, Kemp, Kotz

\mu_3= \frac{nM(N-M)(N-n)(N-2M)(N-2n)}{N^3(N-1)(N-2)} \\

\mu_4 = \frac{nM(N-M)(N-2n)}{N^3(N-1)(N-2)(N-3)} \left[\frac{}{} N(N+1)-6n(N-n) \right. \\ + \left. \frac{M(N-M)}{N^2} \left( N^2(n-2) -Nn^2 +6n(N-n) \right) \right]

It’s really tedious!!!

-------------------------------\\ \\ \mu_{[3]}=E(X(X-1)(X-2)) = \frac{n(n-1)(n-2) M(M-1)(M-2)}{N(N-1)(N-2)} \\ ------------------------------ \\ \\ \mu'_3 = E(X^3) = \mu_{[3]} + 3 \mu'_2 - 2 \mu'_1 \\ = \frac{n(n-1)(n-2) M(M-1)(M-2)}{N(N-1)(N-2)} + 3 \frac{n(n-1)M(M-1)}{N(N-1)} -2 \frac{nM}{N} \\ ------------------------------

\mu_3 = \mu'_3 -3 \mu'_2 \mu_1 +2 (\mu'_1)^3 \\ = \mu_{[3]}+3(\mu_2 + \mu^2) -2\mu-3 (\mu_2 + \mu^2) \mu +2 (\mu'_1)^3 \\ = ( \mu_{[3]} +3 \mu^2 -2\mu + \mu^3) +3\mu_2 -3\mu_2 \mu \\

\mu_{[3]} +3 \mu^2 -2\mu + \mu^3 = \frac{n(n-1)(n-2)M(M-1)(M-2)}{N(N-1)(N-2)} \\ \hskip{3cm}+ \frac{3n^2M^2-2nMN^2-n^3M^3}{N^3} \\ =\frac{nM}{N^3(N-1)(N-2)} \left( n^2M^2N^2-3nM^2N^2+2M^2N^2 -3n^2MN^2 + 9nMN^2 \right. \\ \left.-6MN^2 +2n^2N^2 -6nN^2 +4N^2 + 3nMN^3-9nMN^2 +6nMN -2N^4 \right. \\ \left. +6N^3 -4N^2 -n^2M^2N^2 +3n^2M^2N -2n^2M^2 \right) \\ =\frac{nM}{N^3(N-1)(N-2)} \left[ -3nM^2N(N-n)+3nMN^2(N-n) \right. \\ \left. -6MN(N-n) +6N^2(N-n) \right. \\ \left. +2M^2(N^2-n^2)-2N^2(N^2-n^2) \right)\\ =\frac{nM}{N^3(N-1)(N-2)} \left( (N-n) \left[ 3nMN(N-M) +6N(N-M) \right. \right. \\ \left. \left.-2(N+n)(N^2-M^2) \right] \right) \\ =\frac{nM(N-n)(N-M)}{N^3(N-1)(N-2)} \left( 3nMN +6N - 2(N+n)(N+M) \right)

\mu_3 = \frac{nM(N-n)(N-M)}{N^3(N-1)(N-2)} \left( 3nMN +6N - 2(N+n)(N+M) \right) \\ + 3n \frac{M}{N} \frac{N-M}{N} \frac{N-n}{N-1} -3 n^2 \frac{M^2}{N^2} \frac{N-M}{N} \frac{N-n}{N-1} \\ = \frac{nM(N-n)(N-M)}{N^3(N-1)(N-2)} \left[ 3nMN +6N - 2N^2 -2nN -2NM -2nM \right. \\ \left. +3N^2 -6N -3nMN +6nM \right] \\ = \frac{nM(N-M)(N-n)(N-2M)(N-2n)}{N^3(N-1)(N-2)} \\ --------------------------------

\mu_4 = \frac{nM(N-M)(N-2n)}{N^3(N-1)(N-2)(N-3)} \left[\frac{}{} N(N+1)-6n(N-n) \right. \\ + \left. \frac{M(N-M)}{N^2} \left( N^2(n-2) -Nn^2 +6n(N-n) \right) \right]

15th Jun 2022

discrete-distributions, UG NOTES

discrete-distributions, disributions

Survival Analysis: Competing Risk Theory

INTRODUCTION
A distinctive feature of survival data is the concept of censoring. And an implicit concept in the definition of censoring is that if the study had been prolonged (or if subjects had not dropped out), eventually the outcome of interest would have been observed to occur for all the subjects. Conventional statistical methods for the analysis of survival data make the important assumption of independent or non-informative censoring. This means that at a given point in time, subjects who remain under follow-up have the same future risk for the occurrence of the event as those subjects are no longer being followed (either because of censoring or study dropout), as if losses to follow-up were random and thus non-informative.

A competing risk is an event whose occurrence precludes, the occurrence of the primary event of interest. For instance, in a study in which the primary outcome was time to death due to a cardiovascular cause, a death due to a non-cardiovascular serves as a competing risk.

Conventional statistical methods for the analysis of survival data assume that competing risk are absent. Two competing risks are said to be independent if information about a subject’s risk of experiencing one type of event provides no information about the subject’s risk of experiencing the other type of event. The methods that will be described later on will involve impeding risks which are independent of one another and further also in which competing risks are not independent of one another.

In biomedical applications, the biology often suggests at least some dependence between competing risks, which in many cases may be quite strong. Accordingly, independent competing risks may be relatively rare in biomedical applications.

When analyzing survival data in which competing risks are present, analysts frequently censor subjects when a competing event occurs. Thus, when the outcome is time to death attributable to cardiovascular causes, an analyst may consider a subject as censored once that subject dies of noncardiovascular causes. However, censoring subjects at the time of death attributable to noncardiovascular causes may be problematic.

First, it may violate the assumption of noninformative censoring: it may be unreasonable to assume that subjects who died of noncardiovascular causes (and were thus treated as censored) can be represented by those subjects who remained alive and had not yet died of any cause.

Second, even when the competing events are independent, censoring subjects at the time of the occurrence of a competing event may lead to incorrect conclusions because the event probability being estimated is interpreted as occurring in a setting where the censoring (eg, the competing events) does not occur.

In the cardiovascular example described above, this corresponds to a setting where death from noncardiovascular causes is not a possibility. Although such probabilities may be of theoretical interest, they are of questionable relevance in many practical applications, and generally lead to overestimation of the cumulative incidence of an event in the presence of the competing events.

Pages: 1 2 3 4 5

11th Jun 2022

UG NOTES

competing risks, reliabiltiy, survival, survival anlaysis, survival_analysis

STB – 2019

Consider a count variable X following a Poisson distribution with parameter θ > 0, where zero count (i.e., X = 0) is not observable. We have n observations X₁, . . . , X_n from this distribution. Let denote the sample mean.

a) Derive the quantity for which is an unbiased estimator.

b) Suppose that the observed value of is strictly greater than 1. Show that the likelihood function of θ has a unique maximiser.

The pmf of a Poisson distribution with parameter is given by:

We know that X follows a Poisson distribution with parameter [katex] \theta [/katex] and X=0 is not observable. Under such condition the probability mass function(pmf) of X is given by:

g(x | \theta ) =Pr⁡[ X=x | X≠0] \\ \hskip{1.8cm}=Pr⁡[X=x]/Pr⁡[X≠0] \\ \hskip{0.8cm}=f(x)/(1-f(0) ) \\ \hskip{2.8cm}=\frac{θ^x e^{-θ}} {x!(1-e^{-θ} )} \quad ,x=1,2,3,…

(a) Now,

E (X)= ∑_{x=1}^∞ E\left[x g(x)\right]= ∑_{x=1}^∞E\left[x \frac{θ^x e^{-θ}} {x!(1-e^{-θ} )} \right]= ∑_{x=1}^∞\frac{θ^x e^{-θ}} {(x-1)!(1-e^{-θ} )} \\ = \frac{\theta e^{-θ}} {(1-e^{-θ} )}∑_{x=1}^∞\frac{θ^{x-1}} {(x-1)!} = \frac{\theta e^{-θ}} {(1-e^{-θ} )}∑_{x=0}^∞\frac{θ^{x}} {x!} = \frac{\theta} {(1-e^{-θ} )}

Also, as are random samples drawn from the distribution so,

E(\bar{X})= \frac{1}{n} \sum_{i=1}^n E(X_i) = \frac{\theta} {(1-e^{-θ} )} \quad \left[ i.e. \bar{X} \text{is an u.e. of } \frac{\theta} {(1-e^{-θ} )} \right]

(b) The Likelihood function of x₁,x₂,…,x_n is given by:

L(θ)= ∏_{x=1}^ng(x_i|\theta ) \\ \hskip{2.4cm} =∏_{x=1}^n ((1-e^{-θ} )^{-1} \frac{ θ^{x_i} e^{-θ}}{(x_i !)} \\ \hskip{1.8cm}=\frac{1}{(e^θ-1)^n} \frac{θ^{\sum x_i }}{∏_{i=1}^{n}x_i !}

Taking ‘ln’ on both sides we get,

l(θ)=ln(L(θ))=-n log⁡(e^θ-1)+n \bar{x} ln⁡(θ)-ln⁡(∏x_i !)

Differentiation wrt [katex] \theta [/katex]would give,

\frac{d}{d\theta}l(\theta) = \frac{n}{1-e^{-\theta}} + \frac{n\bar{x}}{\theta} \\ \hskip{.5cm} \frac{d^2}{d\theta^2}l(\theta) = - \frac{n}{(1-e^{-\theta})^2}-\frac{n\bar{x}}{\theta^2}

Now,

\left[\frac{d}{d\theta}l(\theta) \right]_{\theta=\hat{\theta}} =0 \Rightarrow \frac{\hat{\theta}}{1-e^{-\theta}} = \bar{x} \\ \left[\frac{d^2}{d\theta^2}l(\theta) \right]_{\theta=\hat{\theta}}= -\frac{n\bar{x}^2}{\hat{\theta}^2}-\frac{n\bar{x}}{\hat{\theta}^2}<0

Thus, the likelihood function of θ has a unique maximiser.

10th Jun 2022

ISI

ISI, JRF, statistics, STB, STB2019

Ratio Estimation

INTRODUCTION:

In Survey Sampling, we often use information on some auxiliary variable, to improve our estimator of the finite population parameter, by giving our estimator protection against selection of bad sample. One such estimator is the ratio estimator introduced as follows:

In practice, we are often interested to estimate the ratio of the type:

R= \frac{\bar{Y}}{\bar{X}} = \frac{\sum Y_i}{\sum X_i}

For example, in different socio-economic survey, we may be interested in per-capita expenditure on food-items, infant mortality rate, literacy rate, etc. So, the estimation of R itself is of interest to us and beside that, we can get an improved estimate of Y ̄, as follows:

NOTATIONS:

Population Size	[katex] N [/katex]
Population	[katex] U=(U_1,U_2,\dots,U_N) [/katex]
Study variable	[katex] Y=(Y_1,Y_2,\dots,Y_N) [/katex]
Auxilliary Variable	[katex] X=(X_1,X_2,\dots,X_N) [/katex]
Mean of Y(study variable)	[katex] \bar{Y} = \frac{1}{N} \sum Y_\alpha [/katex]
Mean of X( auxilliary variable)	[katex] \bar{X} = \frac{1}{N} \sum X_\alpha [/katex]
Sample size	[katex] n [/katex]
Sample (drawn by SRSWOR from U)	[katex] s=(i_1,i_2,\dots,i_n) [/katex]
Sample Mean of Y(study variable)	[katex] \bar{y} = \frac{1}{n} \sum y_i [/katex]
Sample Mean of X( auxilliary variable)	[katex] \bar{x} = \frac{1}{n} \sum x_i [/katex]

Since, [katex] \bar{Y} = R \bar{X} [/katex] , where R is unknown but [katex] \bar{X} [/katex] is known, we can take

\hat{\bar{Y}}_R = \hat{R} \bar{X} = \frac{\bar{y}}{\bar{x}} \bar{X}

as an estimate of [katex] \bar{Y} [/katex] and this estimator is called the ratio estimator of [katex] \bar{Y} [/katex] .

RESULT 1:

[katex] \hat{\bar{Y}_R} [/katex] is not an unbiased estimator of [katex] \hat{\bar{Y}} [/katex] and its approximate bias is given by:

B(\hat{\bar{Y_R}}) = \frac{1}{\bar{X}} \frac{1-f}{n} \left( RS_X^2 - S_{XY} \right)

;where \quad S_X^2= \frac{1}{N-1} \sum_{i=1}^{N} \left( X_i - \bar{X} \right)^2 , \\ \quad \quad S_{XY}= \frac{1}{N-1} \sum_{i=1}^{N} \left( X_i - \bar{X} \right) \left(Y_i - \bar{Y} \right)

RESULT 2:

\frac{\left| B( \hat{\bar{Y}}_R) \right|}{ \sigma_{\hat{\bar{Y}}_R}} \leq \left| C.V.(\bar{x}) \right|

RESULT 3:

Mean square error of [katex] \hat{\bar{Y}_R} [/katex] is given by:

MSE(\hat{\bar{Y}}_R) = \frac{1-f}{n} \left[ S_Y^2 + R^2 S_X^2 - 2 R S_{XY} \right]

RESULT 4:

In SRSWOR, for large n, an approximation of the variance of [katex] \hat{\bar{Y}_R} [/katex] is given by:

V(\hat{\bar{Y}}_R) = \frac{1-f}{n} \frac{1}{N-1} \sum_{i=1}^{N} U_i^2 \\ \textit{; where} \quad U_i= Y_i-RX_i, \forall i=1(1)N

9th Jun 2022

UG NOTES

ratio, sample-survey, sampling, statistics

Linear Algebra

Definition 1 (Vector Space). A vector space over a field F is a quadruple (V, +, ., F )) satisfying the following axioms for all α, β ∈ F and x, y, z ∈ V :
1. (V, +) is a commutative group, that is,
(a) ’+’ is map from V x V to V. [Closure]
(b) (x + y) + z = x + (y + z). [Associative]
(c) there exists an element 0 of V such that x + 0 = 0 + x = x. [Existence of 0]
(d) for each x in V there exists an element −x in V such that x + (−x) = (−x) + x = 0.
[Existence of Negative]
(e) x + y = y + x. [Commutative]
2. ’.’ is a map from F xV to V. [Closure wrt . ]
3. α.(β.x) = (α.β).x.
4. 1.x=x
5. (α + β).x = (α.x) + (β.x). [Distributivity]
6. α.(x + y) = (α.x) + (α.y). [Distributivity]
Remark 1. The elements of V are called vectors and the elements of F are called scalars. F is called the base field or ground field of the vector space. ’0’ in axiom 1(c) is called the null vector or the zero vector. It is to noted that, the bold faced Roman letters are treated as vectors whereas the lower case Greek letters are treated as scalars throughout the course.

9th Jun 2022

UG NOTES

Merits of Simulation

Demerits of Simulation

THIRD AND FOURTH ORDER MOMENTS:

INTRODUCTION