I am trying to find a distribution that fits my data (3500+ data points) with satisfying goodness of fit (gof), I use the Kolmogorov-Smirnov test and its p-value as a gof measurement (p-value > 0.1).
I have tried the plfit.m and plpva.m program from Clauset et al. to fit a power-law distribution to my data, but the p-value is close to zero indicating it is not a good fit.
The log-normal and exponential distribution is also tested using the R package poweRlaw, but I still can't get a good enough p-value (> 0.1). However, I think the fitted curve is just close enough to the empirical data, as the picture shows (generated by the poweRlaw package).
I am completely new to this kind of job but I have to report the gof in my paper, so I am wondering:
- Did I do the fitting job in the correct way? (I didn't modify the Matlab or R programs)
- Is Kolmogorov-Smirnov test a proper approach to measure the gof?
- If I remove the extreme cases (2 data points on the right tail), will it become better?
- What distribution should I fit my data?