java - Checking for normal distribution hypothesis of discrete dataset -
i newbie in statistics topic, guess might obvious missing here.
basically examine if double array of integer values (histogram) conforms normal distribution (mean , standard deviation specified) significance level, basing on statistical tests apache commons math.
what understand common way calculate p-value , decide if null hypothesis true or not.
my first "baby" step check if 2 arrays coming same distribution using one-way anova test (second part taken example in documentation):
double samples1[] = new double[100]; double samples2[] = new double[100];  random rand = new random(); (int = 0; < 100000; i++) {     int index1 = (int) (rand.nextgaussian()*5 + 50);     int index2 = (int) (rand.nextgaussian()*5 + 50);     try {         samples1[index1-1]++;     }     catch (arrayindexoutofboundsexception e) {}     try {         samples2[index2-1]++;     }     catch (arrayindexoutofboundsexception e) {} }  list classes = new arraylist<>(); classes.add(samples1); classes.add(samples2);  double pvalue = testutils.onewayanovapvalue(classes); boolean fail = testutils.onewayanovatest(classes, 0.05);  system.out.println(pvalue); system.out.println(fail);   the result is:
1.0 false   assuming significance level 0.05 can deduce hypothesis true (i.e. both arrays same distribution) p > 0.05.
now let's take kolmogorov-smirnov test. example code in documentation shows how check single array against normaldistribution object (that goal). allows check 2 arrays. cannot proper result in both cases. example let's adapt above example k-s:
double samples1[] = new double[100]; double samples2[] = new double[100];  random rand = new random(); (int = 0; < 100000; i++) {     int index1 = (int) (rand.nextgaussian()*5 + 50);     int index2 = (int) (rand.nextgaussian()*5 + 50);     try {         samples1[index1-1]++;     }     catch (arrayindexoutofboundsexception e) {}     try {         samples2[index2-1]++;     }     catch (arrayindexoutofboundsexception e) {} }  double pvalue = testutils.kolmogorovsmirnovtest(samples1, samples2); boolean fail = pvalue < 0.05;  system.out.println(pvalue); system.out.println(fail);   result is:
7.475142727031425e-11 true   my question why p-value of same data small? mean test not suited such type of data?
should i:
- generate reference array of 
normaldistribution(that is, specified mean , standard devition) , compare array using one-way anova test (or other) - somehow adapt data , use k-s compare single array against 
normaldistributionobject 
?
 
 
  
Comments
Post a Comment