java - Checking for normal distribution hypothesis of discrete dataset -


i newbie in statistics topic, guess might obvious missing here.

basically examine if double array of integer values (histogram) conforms normal distribution (mean , standard deviation specified) significance level, basing on statistical tests apache commons math.

what understand common way calculate p-value , decide if null hypothesis true or not.

my first "baby" step check if 2 arrays coming same distribution using one-way anova test (second part taken example in documentation):

double samples1[] = new double[100]; double samples2[] = new double[100];  random rand = new random(); (int = 0; < 100000; i++) {     int index1 = (int) (rand.nextgaussian()*5 + 50);     int index2 = (int) (rand.nextgaussian()*5 + 50);     try {         samples1[index1-1]++;     }     catch (arrayindexoutofboundsexception e) {}     try {         samples2[index2-1]++;     }     catch (arrayindexoutofboundsexception e) {} }  list classes = new arraylist<>(); classes.add(samples1); classes.add(samples2);  double pvalue = testutils.onewayanovapvalue(classes); boolean fail = testutils.onewayanovatest(classes, 0.05);  system.out.println(pvalue); system.out.println(fail); 

the result is:

1.0 false 

assuming significance level 0.05 can deduce hypothesis true (i.e. both arrays same distribution) p > 0.05.

now let's take kolmogorov-smirnov test. example code in documentation shows how check single array against normaldistribution object (that goal). allows check 2 arrays. cannot proper result in both cases. example let's adapt above example k-s:

double samples1[] = new double[100]; double samples2[] = new double[100];  random rand = new random(); (int = 0; < 100000; i++) {     int index1 = (int) (rand.nextgaussian()*5 + 50);     int index2 = (int) (rand.nextgaussian()*5 + 50);     try {         samples1[index1-1]++;     }     catch (arrayindexoutofboundsexception e) {}     try {         samples2[index2-1]++;     }     catch (arrayindexoutofboundsexception e) {} }  double pvalue = testutils.kolmogorovsmirnovtest(samples1, samples2); boolean fail = pvalue < 0.05;  system.out.println(pvalue); system.out.println(fail); 

result is:

7.475142727031425e-11 true 

my question why p-value of same data small? mean test not suited such type of data?

should i:

  • generate reference array of normaldistribution (that is, specified mean , standard devition) , compare array using one-way anova test (or other)
  • somehow adapt data , use k-s compare single array against normaldistribution object

?


Comments

Popular posts from this blog

How to run C# code using mono without Xamarin in Android? -

c# - SharpSsh Command Execution -

python - Specify path of savefig with pylab or matplotlib -