statistic
Sponsored Links
Sponsored Links
Secleted [ 0 ] software to compare
Results 1 - 15 of about 616
RRD Statistics 1.0
RRDStats is a Coyote Linux and BrazilFW add-on package for network traffic monitoring. more>>
RRDStats is a Coyote Linux and BrazilFW add-on package for network traffic monitoring, link quality control, and QOS classes monitoring.
RRD Statistics project is based on RRDtool for storing data to round robin databases, and a slightly modified RRDcgi for visualizing data through a Web interface.
Main features:
- Realtime graphical statistics for bandwidth usage and link quality
- Graphical statistics of QOS priority classes usage
- Historical data stored for one week
Configuration:
All default configuration is stored in /etc/rrd.config. This version supports web based configuration and there is no need to manual configuration for basic package functionality. Just install the packages and browse to your web administration interface (by default its http://192.168.0.1:8180). There should be new link at left menu labeled "RRDStats configuration"
There are some basic options you should set up to fit your configuration. First get sure, the RRDstats package is enabled (its the first option at configuration screen). After that should you set up your line speed (just some basic approximation is good enough). The last this you should set up is your internet gateway IP address. This IP address is used to measure your internet link latency and packet loss.
Ignore other configuration options for now, save your configuration and reboot router. After your system boots up, you can browse RRD statistics.
After system startup, package is initialiazed with /etc/rc.d/pkgs/rc.rrdstats. This file start another copy of tiny webserver which listens by default on port 8080. It reads its homepage files from /var/rrd/www/ directory. After webserver startup there are also started some data gathering threads.
They read transfered data from network interfaces, QOS classes and measure link latency. These values are then stored in RRD databases. RRD databases are by default stored in /var/rrd/data/ directory
For further information how RRD databases work, please visit their homepage. Simply said RRD database has constant size, it does not grow over time and stores average data over period of time.
Last component of RRDStats package are .cgi and template files which display data from RRD databases using web interface. As said before, these files and templates are stored in /var/rrd/www/ and its subdirectories.
<<lessRRD Statistics project is based on RRDtool for storing data to round robin databases, and a slightly modified RRDcgi for visualizing data through a Web interface.
Main features:
- Realtime graphical statistics for bandwidth usage and link quality
- Graphical statistics of QOS priority classes usage
- Historical data stored for one week
Configuration:
All default configuration is stored in /etc/rrd.config. This version supports web based configuration and there is no need to manual configuration for basic package functionality. Just install the packages and browse to your web administration interface (by default its http://192.168.0.1:8180). There should be new link at left menu labeled "RRDStats configuration"
There are some basic options you should set up to fit your configuration. First get sure, the RRDstats package is enabled (its the first option at configuration screen). After that should you set up your line speed (just some basic approximation is good enough). The last this you should set up is your internet gateway IP address. This IP address is used to measure your internet link latency and packet loss.
Ignore other configuration options for now, save your configuration and reboot router. After your system boots up, you can browse RRD statistics.
After system startup, package is initialiazed with /etc/rc.d/pkgs/rc.rrdstats. This file start another copy of tiny webserver which listens by default on port 8080. It reads its homepage files from /var/rrd/www/ directory. After webserver startup there are also started some data gathering threads.
They read transfered data from network interfaces, QOS classes and measure link latency. These values are then stored in RRD databases. RRD databases are by default stored in /var/rrd/data/ directory
For further information how RRD databases work, please visit their homepage. Simply said RRD database has constant size, it does not grow over time and stores average data over period of time.
Last component of RRDStats package are .cgi and template files which display data from RRD databases using web interface. As said before, these files and templates are stored in /var/rrd/www/ and its subdirectories.
Download (0.010MB)
Added: 2005-12-27 License: GPL (GNU General Public License) Price:
1398 downloads
Statistics::LTU 2.8
Statistics::LTU is an implementation of Linear Threshold Units. more>>
Statistics::LTU is an implementation of Linear Threshold Units.
SYNOPSIS
use Statistics::LTU;
my $acr_ltu = new Statistics::LTU::ACR(3, 1); # 3 attributes, scaled
$ltu->train([1,3,2], $LTU_PLUS);
$ltu->train([-1,3,0], $LTU_MINUS);
...
print "LTU looks like this:n";
$ltu->print;
print "[1,5,2] is in class ";
if ($ltu->test([1,5,2]) > $LTU_THRESHOLD) { print "PLUS" }
else { print "MINUS" };
$ltu->save("ACR.saved") or die "Save failed!";
$ltu2 = restore Statistics::LTU("ACR.saved");
EXPORTS
For readability, LTU.pm exports three scalar constants: $LTU_PLUS (+1), $LTU_MINUS (-1) and $LTU_THRESHOLD (0).
Statistics::LTU defines methods for creating, destroying, training and testing Linear Threshold Units. A linear threshold unit is a 1-layer neural network, also called a perceptron. LTUs are used to learn classifications from examples.
An LTU learns to distinguish between two classes based on the data given to it. After training on a number of examples, the LTU can then be used to classify new (unseen) examples. Technically, LTUs learn to distinguish two classes by fitting a hyperplane between examples; if the examples have n features, the hyperplane will have n dimensions. In general, the LTUs weights will converge to a define the separating hyperplane.
The LTU.pm file defines an uninstantiable base class, LTU, and four other instantiable classes built on top of LTU. The four individual classes differs in the training rules used:
ACR - Absolute Correction Rule
TACR - Thermal Absolute Correction Rule (thermal annealing)
LMS - Least Mean Squares rule
RLS - Recursive Least Squares rule
Each of these training rules behaves somewhat differently. Exact details of how these work are beyond the scope of this document; see the additional documentation file (ltu.doc) for discussion.
<<lessSYNOPSIS
use Statistics::LTU;
my $acr_ltu = new Statistics::LTU::ACR(3, 1); # 3 attributes, scaled
$ltu->train([1,3,2], $LTU_PLUS);
$ltu->train([-1,3,0], $LTU_MINUS);
...
print "LTU looks like this:n";
$ltu->print;
print "[1,5,2] is in class ";
if ($ltu->test([1,5,2]) > $LTU_THRESHOLD) { print "PLUS" }
else { print "MINUS" };
$ltu->save("ACR.saved") or die "Save failed!";
$ltu2 = restore Statistics::LTU("ACR.saved");
EXPORTS
For readability, LTU.pm exports three scalar constants: $LTU_PLUS (+1), $LTU_MINUS (-1) and $LTU_THRESHOLD (0).
Statistics::LTU defines methods for creating, destroying, training and testing Linear Threshold Units. A linear threshold unit is a 1-layer neural network, also called a perceptron. LTUs are used to learn classifications from examples.
An LTU learns to distinguish between two classes based on the data given to it. After training on a number of examples, the LTU can then be used to classify new (unseen) examples. Technically, LTUs learn to distinguish two classes by fitting a hyperplane between examples; if the examples have n features, the hyperplane will have n dimensions. In general, the LTUs weights will converge to a define the separating hyperplane.
The LTU.pm file defines an uninstantiable base class, LTU, and four other instantiable classes built on top of LTU. The four individual classes differs in the training rules used:
ACR - Absolute Correction Rule
TACR - Thermal Absolute Correction Rule (thermal annealing)
LMS - Least Mean Squares rule
RLS - Recursive Least Squares rule
Each of these training rules behaves somewhat differently. Exact details of how these work are beyond the scope of this document; see the additional documentation file (ltu.doc) for discussion.
Download (0.016MB)
Added: 2007-05-23 License: Perl Artistic License Price:
885 downloads
Statistics::SPC 0.1
Statistics::SPC is a Perl module with calculations for Stastical Process Control (SPC). more>>
Statistics::SPC is a Perl module with calculations for Stastical Process Control (SPC).
Creates thresholds based on the variability of all data, # of samples not meeting spec, and variablity within sample sets, all from training data.
Note: this is only accurate for data which is normally distributed when the process is under control
Recommended usage: at least 15 sample sets, w/ sample size >=2 (5 is good) This module is fudged to work for sample size 1, but its a better idea to use >= 2
Important: the closer the process your are monitoring to how you would like it to be running (steady state), the better the calculated control limits will be.
Example: we take 5 recordings of the CPU utilization at random intervals over the course of a minute. We do this for 15 minutes, keeping all fifteen samples. Using this will be able to tell whether or not CPU use is in steady state.
SYNOPSIS
my $spc = new Statistics::SPC;
$spc->n(5) # set the number of samples per set
$spc->Uspec(.50); # CPU should not be above 50% utilization
$spc->Lspec(.05); # CPU should not be below 5%
# (0 is boring in an example)
# Now feed training data into our object
$return = $spc->history($history); # "train the system";
# $history is ref to 2d array;
# $return > 1 means process not likely to
# meet the constraints of your specified
# upper and lower bounds
# now check to see if the the latest sample of CPU util indicates
# CPU utilization was under control during the time of the sample
$return = $spc->test($data); # check one sample of size n
# $return < 0 there is something wrong with your data
# $return == 0 the sample is "in control"
# $return > 0 there are $return problems with the sample set
<<lessCreates thresholds based on the variability of all data, # of samples not meeting spec, and variablity within sample sets, all from training data.
Note: this is only accurate for data which is normally distributed when the process is under control
Recommended usage: at least 15 sample sets, w/ sample size >=2 (5 is good) This module is fudged to work for sample size 1, but its a better idea to use >= 2
Important: the closer the process your are monitoring to how you would like it to be running (steady state), the better the calculated control limits will be.
Example: we take 5 recordings of the CPU utilization at random intervals over the course of a minute. We do this for 15 minutes, keeping all fifteen samples. Using this will be able to tell whether or not CPU use is in steady state.
SYNOPSIS
my $spc = new Statistics::SPC;
$spc->n(5) # set the number of samples per set
$spc->Uspec(.50); # CPU should not be above 50% utilization
$spc->Lspec(.05); # CPU should not be below 5%
# (0 is boring in an example)
# Now feed training data into our object
$return = $spc->history($history); # "train the system";
# $history is ref to 2d array;
# $return > 1 means process not likely to
# meet the constraints of your specified
# upper and lower bounds
# now check to see if the the latest sample of CPU util indicates
# CPU utilization was under control during the time of the sample
$return = $spc->test($data); # check one sample of size n
# $return < 0 there is something wrong with your data
# $return == 0 the sample is "in control"
# $return > 0 there are $return problems with the sample set
Download (0.011MB)
Added: 2007-05-22 License: Perl Artistic License Price:
887 downloads
Statistics::OLS 0.07
Statistics::OLS is a Perl module to perform ordinary least squares and associated statistics. more>>
Statistics::OLS is a Perl module to perform ordinary least squares and associated statistics.
SYNOPSIS
use Statistics::OLS;
my $ls = Statistics::OLS->new();
$ls->setData (@xydataset) or die( $ls->error() );
$ls->setData (@xdataset, @ydataset);
$ls->regress();
my ($intercept, $slope) = $ls->coefficients();
my $R_squared = $ls->rsq();
my ($tstat_intercept, $tstat_slope) = $ls->tstats();
my $sigma = $ls->sigma();
my $durbin_watson = $ls->dw();
my $sample_size = $ls->size();
my ($avX, $avY) = $ls->av();
my ($varX, $varY, $covXY) = $ls->var();
my ($xmin, $xmax, $ymin, $ymax) = $ls->minMax();
# returned arrays are x-y or y-only data
# depending on initial call to setData()
my @predictedYs = $ls->predicted();
my @residuals = $ls->residuals();
I wrote Statistics::OLS to perform Ordinary Least Squares (linear curve fitting) on two dimensional data: y = a + bx. The other simple statistical module I found on CPAN (Statistics::Descriptive) is designed for univariate analysis. It accomodates OLS, but somewhat inflexibly and without rich bivariate statistics. Nevertheless, it might make sense to fold OLS into that module or a supermodule someday.
Statistics::OLS computes the estimated slope and intercept of the regression line, their T-statistics, R squared, standard error of the regression and the Durbin-Watson statistic. It can also return the residuals.
It is pretty simple to do two dimensional least squares, but much harder to do multiple regression, so OLS is unlikely ever to work with multiple independent variables.
This is a beta code and has not been extensively tested. It has worked on a few published datasets. Feedback is welcome, particularly if you notice an error or try it with known results that are not reproduced correctly.
<<lessSYNOPSIS
use Statistics::OLS;
my $ls = Statistics::OLS->new();
$ls->setData (@xydataset) or die( $ls->error() );
$ls->setData (@xdataset, @ydataset);
$ls->regress();
my ($intercept, $slope) = $ls->coefficients();
my $R_squared = $ls->rsq();
my ($tstat_intercept, $tstat_slope) = $ls->tstats();
my $sigma = $ls->sigma();
my $durbin_watson = $ls->dw();
my $sample_size = $ls->size();
my ($avX, $avY) = $ls->av();
my ($varX, $varY, $covXY) = $ls->var();
my ($xmin, $xmax, $ymin, $ymax) = $ls->minMax();
# returned arrays are x-y or y-only data
# depending on initial call to setData()
my @predictedYs = $ls->predicted();
my @residuals = $ls->residuals();
I wrote Statistics::OLS to perform Ordinary Least Squares (linear curve fitting) on two dimensional data: y = a + bx. The other simple statistical module I found on CPAN (Statistics::Descriptive) is designed for univariate analysis. It accomodates OLS, but somewhat inflexibly and without rich bivariate statistics. Nevertheless, it might make sense to fold OLS into that module or a supermodule someday.
Statistics::OLS computes the estimated slope and intercept of the regression line, their T-statistics, R squared, standard error of the regression and the Durbin-Watson statistic. It can also return the residuals.
It is pretty simple to do two dimensional least squares, but much harder to do multiple regression, so OLS is unlikely ever to work with multiple independent variables.
This is a beta code and has not been extensively tested. It has worked on a few published datasets. Feedback is welcome, particularly if you notice an error or try it with known results that are not reproduced correctly.
Download (0.008MB)
Added: 2007-05-23 License: Perl Artistic License Price:
531 downloads
Statistics::Gap 0.10
Statistics::Gap Perl module is an adaptation of the Gap Statistic. more>>
Statistics::Gap Perl module is an adaptation of the Gap Statistic.
SYNOPSIS
use Statistics::Gap;
$predictedk = &gap("prefix", "vec", INPUTMATRIX, "rbr", "h2", 30, 10, rep, 90, 4);
OR
use Statistics::Gap;
$predictedk = &gap("prefix", "vec", INPUTMATRIX, "rbr", "h2", 30, 10, rep, 90, 4, 7);
INPUTS
1. Prefix: The string that should be used to as a prefix while naming the intermediate files and the .dat files (plot files).
2. Space: Specifies the space in which the clustering should be performed. Valid parameter values: vec - vector space sim - similarity space
3. InputMatrix: Path to input matrix file. (More details about the input file-format below.)
4. ClusteringMethod: Specifies the clustering method to be used. (Learn more about this at: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview)
Valid parameter values:
rb - Repeated Bisections
rbr - Repeated Bisections for by k-way refinement
direct - Direct k-way clustering
agglo - Agglomerative clustering
bagglo - Partitional biased Agglomerative clustering
NOTE: bagglo can be used only if space=vec
5. Crfun: Specifies the criterion function to be used for finding clustering solutions. (Learn more about this at: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview)
Valid parameter values:
i1 - I1 Criterion function
i2 - I2 Criterion function
e1 - E1 Criterion function
h1 - H1 Criterion function
h2 - H2 Criterion function
6. K: This is an approximate upper bound for the number of clusters that may be present in the dataset.
7. B: The number of replicates/references to be generated.
8. TypeRef: Specifies whether to generate B replicates from a reference or to generate B references.
Valid parameter values:
rep - replicates
ref - references
9. Percentage: Specifies the percentage confidence to be reported in the log file. Since Statistics::Gap uses parametric bootstrap method for reference distribution generation, it is critical to understand the interval around the sample mean that could contain the population ("true") mean and with what certainty.
10. Precision: Specifies the precision to be used while generating the reference distribution.
11. Seed: The seed to be used with the random number generator. (This is an optional parameter. By default no seed is set.)
<<lessSYNOPSIS
use Statistics::Gap;
$predictedk = &gap("prefix", "vec", INPUTMATRIX, "rbr", "h2", 30, 10, rep, 90, 4);
OR
use Statistics::Gap;
$predictedk = &gap("prefix", "vec", INPUTMATRIX, "rbr", "h2", 30, 10, rep, 90, 4, 7);
INPUTS
1. Prefix: The string that should be used to as a prefix while naming the intermediate files and the .dat files (plot files).
2. Space: Specifies the space in which the clustering should be performed. Valid parameter values: vec - vector space sim - similarity space
3. InputMatrix: Path to input matrix file. (More details about the input file-format below.)
4. ClusteringMethod: Specifies the clustering method to be used. (Learn more about this at: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview)
Valid parameter values:
rb - Repeated Bisections
rbr - Repeated Bisections for by k-way refinement
direct - Direct k-way clustering
agglo - Agglomerative clustering
bagglo - Partitional biased Agglomerative clustering
NOTE: bagglo can be used only if space=vec
5. Crfun: Specifies the criterion function to be used for finding clustering solutions. (Learn more about this at: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/overview)
Valid parameter values:
i1 - I1 Criterion function
i2 - I2 Criterion function
e1 - E1 Criterion function
h1 - H1 Criterion function
h2 - H2 Criterion function
6. K: This is an approximate upper bound for the number of clusters that may be present in the dataset.
7. B: The number of replicates/references to be generated.
8. TypeRef: Specifies whether to generate B replicates from a reference or to generate B references.
Valid parameter values:
rep - replicates
ref - references
9. Percentage: Specifies the percentage confidence to be reported in the log file. Since Statistics::Gap uses parametric bootstrap method for reference distribution generation, it is critical to understand the interval around the sample mean that could contain the population ("true") mean and with what certainty.
10. Precision: Specifies the precision to be used while generating the reference distribution.
11. Seed: The seed to be used with the random number generator. (This is an optional parameter. By default no seed is set.)
Download (2.5MB)
Added: 2007-05-23 License: Perl Artistic License Price:
884 downloads
Statistics::SDT 0.02
Statistics::SDT Perl package contains signal detection theory measures of sensitivity and response-bias. more>>
Statistics::SDT Perl package contains signal detection theory measures of sensitivity and response-bias.
SYNOPSIS
use Statistics::SDT;
$sdt = Statistics::SDT->new(
{
hits => 50,
signal_trials => 50,
false_alarms => 17,
noise_trials => 25,
correct => 2,
}
);
$d = $sdt->d_sensitivity();
$c = $sdt->decision_bias();
Signal Detection Theory algorithms (e.g., of d, A, decision bias), as prescribed by Stanislav & Todorov (1999). Both object- and function-oriented interfaces are provided.
KEY VALUES
For both object- and function-oriented styles, the following named parameters must be given as a hash-reference: either to the new constructor method, or (with the function-oriented style) into each function. Basically, either all of the first four parameters are required (in order to calculate the hit-rate and false-alarm-rate), or the required rates are themselves supplied.
hits
The number of hits.
false_alarms
The number of false alarms.
signal_trials
The number of signal trials. The hit-rate is derived by dividing the number of hits by the number of signal trials.
noise_trials
The number of noise trials. The false-alarm-rate is derived by dividing the number of false-alarms by the number of noise trials.
alternatives
The number of response alternatives. Default = 2 (for the classic signal-detection situation of discriminating between signal+noise and noise-only). If the number of alternatives is greater than 2, the measure of sensitivity, when calling d_sensitivity, is based on the Smith (1982) algorithms.
correct
A parameter that indicates whether or not to perform a correction on the number of hits and false-alarms as a corrective when the hit-rate or false-alarm-rate equals 0 or 1 (due, e.g., to strong inducements against false-alarms, or easy discrimination between signals and noise). This is relevant to all functions that make use of the inverse phi function (all except a_sensitivity and griers_bias).
If set to greater than 1, the loglinear transformation is applied, i.e., 0.5 is added to both the number of hits and false-alarms, and 1 is added to the number of signal and noise trials. These adjustments are made irrespective of the extremity of the rates themselves.
If set to 1, extreme rates (of 0 and 1, only) are replaced with the number of signal/noise trials, moderated by a value of 0.5 (specifically, where n = number of signal or noise trials: 0 is replaced with 0.5 / n; 1 is replaced with (n - 0.5) / n.
Stanislav and Todorov (1999) advise that the latter correction is the most common method of handling extreme rates, but that it might bias sensitivity measures and not be as satisfactory as the loglinear transformation applied to all hits and false-alarms.
If set to zero (the default), no correction is performed to the calculation of the rates. This should only be used when you are using (1) the parametric measures and are sure the rates are not at the extremes of 0 and 1; or (2) the nonparametric algorithms (a_sensitivity and griers_bias). An alternative to these corrections is, indeed, to use the nonparametric measures.
hr
This is the hit-rate. Instead of passing the number of hits and signal trials, give the hit-rate directly - but, if doing so, ensure the rate does not equal zero or 1 in order to avoid errors thrown by the inverse-phi function (which will be given as "ndtri domain error").
far
This is the false-alarm-rate. Instead of passing the number of false alarms and noise trials, give the false-alarm-rate directly - but, if doing so, ensure the rate does not equal zero or 1 in order to avoid errors thrown by the inverse-phi function (which will be given as "ndtri domain error").
<<lessSYNOPSIS
use Statistics::SDT;
$sdt = Statistics::SDT->new(
{
hits => 50,
signal_trials => 50,
false_alarms => 17,
noise_trials => 25,
correct => 2,
}
);
$d = $sdt->d_sensitivity();
$c = $sdt->decision_bias();
Signal Detection Theory algorithms (e.g., of d, A, decision bias), as prescribed by Stanislav & Todorov (1999). Both object- and function-oriented interfaces are provided.
KEY VALUES
For both object- and function-oriented styles, the following named parameters must be given as a hash-reference: either to the new constructor method, or (with the function-oriented style) into each function. Basically, either all of the first four parameters are required (in order to calculate the hit-rate and false-alarm-rate), or the required rates are themselves supplied.
hits
The number of hits.
false_alarms
The number of false alarms.
signal_trials
The number of signal trials. The hit-rate is derived by dividing the number of hits by the number of signal trials.
noise_trials
The number of noise trials. The false-alarm-rate is derived by dividing the number of false-alarms by the number of noise trials.
alternatives
The number of response alternatives. Default = 2 (for the classic signal-detection situation of discriminating between signal+noise and noise-only). If the number of alternatives is greater than 2, the measure of sensitivity, when calling d_sensitivity, is based on the Smith (1982) algorithms.
correct
A parameter that indicates whether or not to perform a correction on the number of hits and false-alarms as a corrective when the hit-rate or false-alarm-rate equals 0 or 1 (due, e.g., to strong inducements against false-alarms, or easy discrimination between signals and noise). This is relevant to all functions that make use of the inverse phi function (all except a_sensitivity and griers_bias).
If set to greater than 1, the loglinear transformation is applied, i.e., 0.5 is added to both the number of hits and false-alarms, and 1 is added to the number of signal and noise trials. These adjustments are made irrespective of the extremity of the rates themselves.
If set to 1, extreme rates (of 0 and 1, only) are replaced with the number of signal/noise trials, moderated by a value of 0.5 (specifically, where n = number of signal or noise trials: 0 is replaced with 0.5 / n; 1 is replaced with (n - 0.5) / n.
Stanislav and Todorov (1999) advise that the latter correction is the most common method of handling extreme rates, but that it might bias sensitivity measures and not be as satisfactory as the loglinear transformation applied to all hits and false-alarms.
If set to zero (the default), no correction is performed to the calculation of the rates. This should only be used when you are using (1) the parametric measures and are sure the rates are not at the extremes of 0 and 1; or (2) the nonparametric algorithms (a_sensitivity and griers_bias). An alternative to these corrections is, indeed, to use the nonparametric measures.
hr
This is the hit-rate. Instead of passing the number of hits and signal trials, give the hit-rate directly - but, if doing so, ensure the rate does not equal zero or 1 in order to avoid errors thrown by the inverse-phi function (which will be given as "ndtri domain error").
far
This is the false-alarm-rate. Instead of passing the number of false alarms and noise trials, give the false-alarm-rate directly - but, if doing so, ensure the rate does not equal zero or 1 in order to avoid errors thrown by the inverse-phi function (which will be given as "ndtri domain error").
Download (0.007MB)
Added: 2007-05-23 License: Perl Artistic License Price:
889 downloads
Statistics::ROC 0.04
Statistics::ROC is a Perl module with receiver-operator-characteristic (ROC) curves with nonparametric confidence bounds. more>>
Statistics::ROC is a Perl module with receiver-operator-characteristic (ROC) curves with nonparametric confidence bounds.
SYNOPSIS
use Statistics::ROC;
my ($y) = loggamma($x);
my ($y) = betain($x, $p, $q, $beta);
my ($y) = Betain($x, $p, $q);
my ($y) = xinbta($p, $q, $beta, $alpha);
my ($y) = Xinbta($p, $q, $alpha);
my (@rk) = rank($type, @r);
my (@ROC) = roc($model_type,$conf,@val_grp);
This program determines the ROC curve and its nonparametric confidence bounds for data categorized into two groups. A ROC curve shows the relationship of probability of false alarm (x-axis) to probability of detection (y-axis) for a certain test. Expressed in medical terms: the probability of a positive test, given no disease to the probability of a positive test, given disease. The ROC curve may be used to determine an optimal cutoff point for the test.
The main function is roc(). The other exported functions are used by roc(), but might be useful for other nonparametric statistical procedures.
loggamma
This procedure evaluates the natural logarithm of gamma(x) for all x>0, accurate to 10 decimal places. Stirlings formula is used for the central polynomial part of the procedure. For x=0 a value of 743.746924740801 will be returned: this is loggamma(9.9999999999E-324).
betain
Computes incomplete beta function ratio
Remarks:
Complete beta function: B(p,q)=gamma(p)*gamma(q)/gamma(p+q)
log(B(p,q))=ln(gamma(p))+ln(gamma(q))-ln(gamma(p+q))
Incomplete beta function ratio:
I_x(p,q)=1/B(p,q) * int_0^x t^{p-1}*(1-t)^{q-1} dt
--> log(B(p,q)) has to be supplied to calculate I_x(p,q)
log denotes the natural logarithm
$beta = log(B(p,q))
$x = x
$p = p
$q = q
The subroutine returns I_x(p,q). If an error occurs a negative value
{-1,-2} is returned.
Betain
Computes the incomplete beta function by calling loggamma() and betain().
xinbta
Computes inverse of incomplete beta function ratio
Remarks:
Complete beta function: B(p,q)=gamma(p)*gamma(q)/gamma(p+q)
log(B(p,q))=ln(gamma(p))+ln(gamma(q))-ln(gamma(p+q))
Incomplete beta function ratio:
alpha = I_x(p,q) = 1/B(p,q) * int_0^x t^{p-1}*(1-t)^{q-1} dt
--> log(B(p,q)) has to be supplied to calculate I_x(p,q)
log denotes the natural logarithm
$beta = log(B(p,q))
$alpha= I_x(p,q)
$p = p
$q = q
The subroutine returns x. If an error occurs a negative value {-1,-2,-3}
is returned.
Xinbta
Computes the inverse of the incomplete beta function by calling loggamma() and xinbta().
rank
Computes the ranks of the values specified as the second argument (an array). Returns a vector of ranks corresponding to the input vector. Different types of ranking are possible (high, low, mean), and are specified as first argument. These differ in the way ties of the input vector, i.e. identical values, are treated:
high:
replace ranks of identical values with their highest rank
low:
replace ranks of identical values with their lowest rank
mean:
replace ranks of identical values with the mean of their ranks
roc
Determines the ROC curve and its nonparametric confidence bounds. The ROC curve shows the relationship of "probability of false alarm" (x-axis) to "probability of detection" (y-axis) for a certain test. Or in medical terms: the "probability of a positive test, given no disease" to the "probability of a positive test, given disease". The ROC curve may be used to determine an "optimal" cutoff point for the test.
The routine takes three arguments:
(1) type of model: decrease or increase, this states the assumption that a higher (increase) value of the data tends to be an indicator of a positive test result or for the model decrease a lower value.
(2) two-sided confidence interval (usually 0.95 is chosen).
(3) the data stored as a list-of-lists: each entry in this list consits of an "value / true group" pair, i.e. value / disease present. Group values are from {0,1}. 0 stands for disease (or signal) not present (prior knowledge) and 1 for disease (or signal) present (prior knowledge). Example: @s=([2, 0], [12.5, 1], [3, 0], [10, 1], [9.5, 0], [9, 1]); Notice the small overlap of the groups. The optimal cutoff point to separate the two groups would be between 9 and 9.5 if the criterion of optimality is to maximize the probability of detection and simultaneously minimize the probability of false alarm.
Returns a list-of-lists with the three curves: @ROC=([@lower_b], [@roc], [@upper_b]) each of the curves is again a list-of-lists with each entry consisting of one (x,y) pair.
Examples:
$,=" ";
print loggamma(10), "n";
print Xinbta(3,4,Betain(.6,3,4)),"n";
@e=(0.7, 0.7, 0.9, 0.6, 1.0, 1.1, 1,.7,.6);
print rank(low,@e),"n";
print rank(high,@e),"n";
print rank(mean,@e),"n";
@var_grp=([1.5,0],[1.4,0],[1.4,0],[1.3,0],[1.2,0],[1,0],[0.8,0],
[1.1,1],[1,1],[1,1],[0.9,1],[0.7,1],[0.7,1],[0.6,1]);
@curves=roc(decrease,0.95,@var_grp);
print "$curves[0][2][0] $curves[0][2][1] n";
<<lessSYNOPSIS
use Statistics::ROC;
my ($y) = loggamma($x);
my ($y) = betain($x, $p, $q, $beta);
my ($y) = Betain($x, $p, $q);
my ($y) = xinbta($p, $q, $beta, $alpha);
my ($y) = Xinbta($p, $q, $alpha);
my (@rk) = rank($type, @r);
my (@ROC) = roc($model_type,$conf,@val_grp);
This program determines the ROC curve and its nonparametric confidence bounds for data categorized into two groups. A ROC curve shows the relationship of probability of false alarm (x-axis) to probability of detection (y-axis) for a certain test. Expressed in medical terms: the probability of a positive test, given no disease to the probability of a positive test, given disease. The ROC curve may be used to determine an optimal cutoff point for the test.
The main function is roc(). The other exported functions are used by roc(), but might be useful for other nonparametric statistical procedures.
loggamma
This procedure evaluates the natural logarithm of gamma(x) for all x>0, accurate to 10 decimal places. Stirlings formula is used for the central polynomial part of the procedure. For x=0 a value of 743.746924740801 will be returned: this is loggamma(9.9999999999E-324).
betain
Computes incomplete beta function ratio
Remarks:
Complete beta function: B(p,q)=gamma(p)*gamma(q)/gamma(p+q)
log(B(p,q))=ln(gamma(p))+ln(gamma(q))-ln(gamma(p+q))
Incomplete beta function ratio:
I_x(p,q)=1/B(p,q) * int_0^x t^{p-1}*(1-t)^{q-1} dt
--> log(B(p,q)) has to be supplied to calculate I_x(p,q)
log denotes the natural logarithm
$beta = log(B(p,q))
$x = x
$p = p
$q = q
The subroutine returns I_x(p,q). If an error occurs a negative value
{-1,-2} is returned.
Betain
Computes the incomplete beta function by calling loggamma() and betain().
xinbta
Computes inverse of incomplete beta function ratio
Remarks:
Complete beta function: B(p,q)=gamma(p)*gamma(q)/gamma(p+q)
log(B(p,q))=ln(gamma(p))+ln(gamma(q))-ln(gamma(p+q))
Incomplete beta function ratio:
alpha = I_x(p,q) = 1/B(p,q) * int_0^x t^{p-1}*(1-t)^{q-1} dt
--> log(B(p,q)) has to be supplied to calculate I_x(p,q)
log denotes the natural logarithm
$beta = log(B(p,q))
$alpha= I_x(p,q)
$p = p
$q = q
The subroutine returns x. If an error occurs a negative value {-1,-2,-3}
is returned.
Xinbta
Computes the inverse of the incomplete beta function by calling loggamma() and xinbta().
rank
Computes the ranks of the values specified as the second argument (an array). Returns a vector of ranks corresponding to the input vector. Different types of ranking are possible (high, low, mean), and are specified as first argument. These differ in the way ties of the input vector, i.e. identical values, are treated:
high:
replace ranks of identical values with their highest rank
low:
replace ranks of identical values with their lowest rank
mean:
replace ranks of identical values with the mean of their ranks
roc
Determines the ROC curve and its nonparametric confidence bounds. The ROC curve shows the relationship of "probability of false alarm" (x-axis) to "probability of detection" (y-axis) for a certain test. Or in medical terms: the "probability of a positive test, given no disease" to the "probability of a positive test, given disease". The ROC curve may be used to determine an "optimal" cutoff point for the test.
The routine takes three arguments:
(1) type of model: decrease or increase, this states the assumption that a higher (increase) value of the data tends to be an indicator of a positive test result or for the model decrease a lower value.
(2) two-sided confidence interval (usually 0.95 is chosen).
(3) the data stored as a list-of-lists: each entry in this list consits of an "value / true group" pair, i.e. value / disease present. Group values are from {0,1}. 0 stands for disease (or signal) not present (prior knowledge) and 1 for disease (or signal) present (prior knowledge). Example: @s=([2, 0], [12.5, 1], [3, 0], [10, 1], [9.5, 0], [9, 1]); Notice the small overlap of the groups. The optimal cutoff point to separate the two groups would be between 9 and 9.5 if the criterion of optimality is to maximize the probability of detection and simultaneously minimize the probability of false alarm.
Returns a list-of-lists with the three curves: @ROC=([@lower_b], [@roc], [@upper_b]) each of the curves is again a list-of-lists with each entry consisting of one (x,y) pair.
Examples:
$,=" ";
print loggamma(10), "n";
print Xinbta(3,4,Betain(.6,3,4)),"n";
@e=(0.7, 0.7, 0.9, 0.6, 1.0, 1.1, 1,.7,.6);
print rank(low,@e),"n";
print rank(high,@e),"n";
print rank(mean,@e),"n";
@var_grp=([1.5,0],[1.4,0],[1.4,0],[1.3,0],[1.2,0],[1,0],[0.8,0],
[1.1,1],[1,1],[1,1],[0.9,1],[0.7,1],[0.7,1],[0.6,1]);
@curves=roc(decrease,0.95,@var_grp);
print "$curves[0][2][0] $curves[0][2][1] n";
Download (0.017MB)
Added: 2007-05-23 License: Perl Artistic License Price:
885 downloads
Free Statistics 1.1.0
Free Statistics records and views daily Web site page views (hits) for statistical tracking. more>>
Free Statistics records and views daily Web site page views (hits) for statistical tracking. This is a Free PHP script to record and view daily website page views (hits) for statistical tracking. Features a chart of daily page views totals displayed with bar graph, total for last x days, most hits in a day for last x days, average hits per day for last x days, projected hits for today, and more. Easy to install.
Edit the values in config.php for MySQL; change the other variables if you want (is optional). Do not edit other files.
Copy the files to the same directory on your server.
Install MySQL table. Execute the following in PhpMyAdmin or other MySQL interface:
CREATE TABLE stats_day (
date date DEFAULT 0000-00-00 NOT NULL,
hits mediumint(8) unsigned DEFAULT 0 NOT NULL,
PRIMARY KEY (date)
);
For php files, you can add this code to each page to record page views to it (be sure to add the path if needed):
If the page is in a different folder than the stats script, you can add the path such as:
You can record stats for non-php pages (and php pages also) by adding this code in the body of the html (remember to add the correct path to the script; you can use a full url here; Note, this only records hits for browsers with images-loading enabled):
Main features:
- Chart of daily page views totals displayed with bar graph, total for last x days, most hits in a day for last x days, average hits per day for last x days, projected hits for today, and more. Easy to install.
<<lessEdit the values in config.php for MySQL; change the other variables if you want (is optional). Do not edit other files.
Copy the files to the same directory on your server.
Install MySQL table. Execute the following in PhpMyAdmin or other MySQL interface:
CREATE TABLE stats_day (
date date DEFAULT 0000-00-00 NOT NULL,
hits mediumint(8) unsigned DEFAULT 0 NOT NULL,
PRIMARY KEY (date)
);
For php files, you can add this code to each page to record page views to it (be sure to add the path if needed):
If the page is in a different folder than the stats script, you can add the path such as:
You can record stats for non-php pages (and php pages also) by adding this code in the body of the html (remember to add the correct path to the script; you can use a full url here; Note, this only records hits for browsers with images-loading enabled):
Main features:
- Chart of daily page views totals displayed with bar graph, total for last x days, most hits in a day for last x days, average hits per day for last x days, projected hits for today, and more. Easy to install.
Download (0.006MB)
Added: 2006-06-23 License: GPL (GNU General Public License) Price:
1222 downloads
Statistics::TTest 1.1.0
Statistics::TTest is a Perl module to perform T-test on 2 independent samples. more>>
Statistics::TTest is a Perl module to perform T-test on 2 independent samples.
Statistics::TTest::Sufficient - Perl module to perfrom T-Test on 2 indepdent samples using sufficient statistics
SYNOPSIS
#example for Statistics::TTest
use Statistics::PointEstimation;
use Statistics::TTest;
my @r1=();
my @r2=();
my $rand;
for($i=1;$iset_significance(90);
$ttest->load_data(@r1,@r2);
$ttest->output_t_test();
$ttest->set_significance(99);
$ttest->print_t_test(); #list out t-test related data
#the following thes same as calling output_t_test() (you can check if $ttest->{valid}==1 to check if the data is valid.)
my $s1=$ttest->{s1}; #sample 1 a Statistics::PointEstimation object
my $s2=$ttest->{s2}; #sample 2 a Statistics::PointEstimation object
print "*****************************************************nn";
$s1->output_confidence_interval(1);
print "*****************************************************nn";
$s2->output_confidence_interval(2);
print "*****************************************************nn";
print "Comparison of these 2 independent samples.n";
print "t F-statistic=",$ttest->f_statistic()," , cutoff F-statistic=",$ttest->f_cutoff(),
" with alpha level=",$ttest->alpha*2," and df =(",$ttest->df1,",",$ttest->df2,")n";
if($ttest->{equal_variance})
{ print "tequal variance assumption is accepted(not rejected) since F-statistic < cutoff F-statisticn";}
else
{ print "tequal variance assumption is rejected since F-statistic > cutoff F-statisticn";}
print "tdegree of freedom=",$ttest->df," , t-statistic=T=",$ttest->t_statistic," Prob >|T|=",$ttest->{t_prob},"n";
print "tthe null hypothesis (the 2 samples have the same mean) is ",$ttest->null_hypothesis(),
" since the alpha level is ",$ttest->alpha()*2,"n";
print "tdifference of the mean=",$ttest->mean_difference(),", standard error=",$ttest->standard_error(),"n";
print "t the estimate of the difference of the mean is ", $ttest->mean_difference()," +/- ",$ttest->delta(),"nt",
" or (",$ttest->lower_clm()," to ",$ttest->upper_clm," ) with ",$ttest->significance," % of confidencen";
#example for Statistics::TTest::Sufficient
use Statistics::PointEstimation;
use Statistics::TTest;
my %sample1=(
count =>30,
mean =>3.98,
variance =>2.63
);
my %sample2=(
count=>30,
mean=>3.67,
variance=>1.12
);
my $ttest = new Statistics::TTest::Sufficient;
$ttest->set_significance(90);
$ttest->load_data(%sample1,%sample2);
$ttest->output_t_test();
#$ttest->s1->print_confidence_interval();
$ttest->set_significance(99);
$ttest->output_t_test();
#$ttest->s1->print_confidence_interval();
Statistics::TTest
This is the Statistical T-Test module to compare 2 independent samples. It takes 2 array of point measures, compute the confidence intervals using the PointEstimation module (which is also included in this package) and use the T-statistic to test the null hypothesis. If the null hypothesis is rejected, the difference will be given as the lower_clm and upper_clm of the TTest object.
Statistics::TTest::Sufficient
This module is a subclass of Statistics::TTest. Instead of taking the real data points as the input, it will compute the confidence intervals based on the sufficient statistics and the sample size inputted. To use this module, you need to pass the sample size, the sample mean , and the sample variance into the load_data() function. The output will be exactly the same as the Statistics::TTest Module.
<<lessStatistics::TTest::Sufficient - Perl module to perfrom T-Test on 2 indepdent samples using sufficient statistics
SYNOPSIS
#example for Statistics::TTest
use Statistics::PointEstimation;
use Statistics::TTest;
my @r1=();
my @r2=();
my $rand;
for($i=1;$iset_significance(90);
$ttest->load_data(@r1,@r2);
$ttest->output_t_test();
$ttest->set_significance(99);
$ttest->print_t_test(); #list out t-test related data
#the following thes same as calling output_t_test() (you can check if $ttest->{valid}==1 to check if the data is valid.)
my $s1=$ttest->{s1}; #sample 1 a Statistics::PointEstimation object
my $s2=$ttest->{s2}; #sample 2 a Statistics::PointEstimation object
print "*****************************************************nn";
$s1->output_confidence_interval(1);
print "*****************************************************nn";
$s2->output_confidence_interval(2);
print "*****************************************************nn";
print "Comparison of these 2 independent samples.n";
print "t F-statistic=",$ttest->f_statistic()," , cutoff F-statistic=",$ttest->f_cutoff(),
" with alpha level=",$ttest->alpha*2," and df =(",$ttest->df1,",",$ttest->df2,")n";
if($ttest->{equal_variance})
{ print "tequal variance assumption is accepted(not rejected) since F-statistic < cutoff F-statisticn";}
else
{ print "tequal variance assumption is rejected since F-statistic > cutoff F-statisticn";}
print "tdegree of freedom=",$ttest->df," , t-statistic=T=",$ttest->t_statistic," Prob >|T|=",$ttest->{t_prob},"n";
print "tthe null hypothesis (the 2 samples have the same mean) is ",$ttest->null_hypothesis(),
" since the alpha level is ",$ttest->alpha()*2,"n";
print "tdifference of the mean=",$ttest->mean_difference(),", standard error=",$ttest->standard_error(),"n";
print "t the estimate of the difference of the mean is ", $ttest->mean_difference()," +/- ",$ttest->delta(),"nt",
" or (",$ttest->lower_clm()," to ",$ttest->upper_clm," ) with ",$ttest->significance," % of confidencen";
#example for Statistics::TTest::Sufficient
use Statistics::PointEstimation;
use Statistics::TTest;
my %sample1=(
count =>30,
mean =>3.98,
variance =>2.63
);
my %sample2=(
count=>30,
mean=>3.67,
variance=>1.12
);
my $ttest = new Statistics::TTest::Sufficient;
$ttest->set_significance(90);
$ttest->load_data(%sample1,%sample2);
$ttest->output_t_test();
#$ttest->s1->print_confidence_interval();
$ttest->set_significance(99);
$ttest->output_t_test();
#$ttest->s1->print_confidence_interval();
Statistics::TTest
This is the Statistical T-Test module to compare 2 independent samples. It takes 2 array of point measures, compute the confidence intervals using the PointEstimation module (which is also included in this package) and use the T-statistic to test the null hypothesis. If the null hypothesis is rejected, the difference will be given as the lower_clm and upper_clm of the TTest object.
Statistics::TTest::Sufficient
This module is a subclass of Statistics::TTest. Instead of taking the real data points as the input, it will compute the confidence intervals based on the sufficient statistics and the sample size inputted. To use this module, you need to pass the sample size, the sample mean , and the sample variance into the load_data() function. The output will be exactly the same as the Statistics::TTest Module.
Download (0.006MB)
Added: 2006-12-18 License: Perl Artistic License Price:
1044 downloads
Statistics::LineFit 0.07
Statistics::LineFit module least squares line fit, weighted or unweighted. more>>
Statistics::LineFit module least squares line fit, weighted or unweighted.
SYNOPSIS
use Statistics::LineFit;
$lineFit = Statistics::LineFit->new();
$lineFit->setData (@xValues, @yValues) or die "Invalid data";
($intercept, $slope) = $lineFit->coefficients();
defined $intercept or die "Cant fit line if x values are all equal";
$rSquared = $lineFit->rSquared();
$meanSquaredError = $lineFit->meanSqError();
$durbinWatson = $lineFit->durbinWatson();
$sigma = $lineFit->sigma();
($tStatIntercept, $tStatSlope) = $lineFit->tStatistics();
@predictedYs = $lineFit->predictedYs();
@residuals = $lineFit->residuals();
(varianceIntercept, $varianceSlope) = $lineFit->varianceOfEstimates();
The Statistics::LineFit module does weighted or unweighted least-squares line fitting to two-dimensional data (y = a + b * x). (This is also called linear regression.) In addition to the slope and y-intercept, the module can return the square of the correlation coefficient (R squared), the Durbin-Watson statistic, the mean squared error, sigma, the t statistics, the variance of the estimates of the slope and y-intercept, the predicted y values and the residuals of the y values. (See the METHODS section for a description of these statistics.)
The module accepts input data in separate x and y arrays or a single 2-D array (an array of arrayrefs). The optional weights are input in a separate array. The module can optionally verify that the input data and weights are valid numbers. If weights are input, the line fit minimizes the weighted sum of the squared errors and the following statistics are weighted: the correlation coefficient, the Durbin-Watson statistic, the mean squared error, sigma and the t statistics.
The module is state-oriented and caches its results. Once you call the setData() method, you can call the other methods in any order or call a method several times without invoking redundant calculations. After calling setData(), you can modify the input data or weights without affecting the modules results.
The decision to use or not use weighting could be made using your a priori knowledge of the data or using supplemental data. If the data is sparse or contains non-random noise, weighting can degrade the solution. Weighting is a good option if some points are suspect or less relevant (e.g., older terms in a time series, points that are known to have more noise).
<<lessSYNOPSIS
use Statistics::LineFit;
$lineFit = Statistics::LineFit->new();
$lineFit->setData (@xValues, @yValues) or die "Invalid data";
($intercept, $slope) = $lineFit->coefficients();
defined $intercept or die "Cant fit line if x values are all equal";
$rSquared = $lineFit->rSquared();
$meanSquaredError = $lineFit->meanSqError();
$durbinWatson = $lineFit->durbinWatson();
$sigma = $lineFit->sigma();
($tStatIntercept, $tStatSlope) = $lineFit->tStatistics();
@predictedYs = $lineFit->predictedYs();
@residuals = $lineFit->residuals();
(varianceIntercept, $varianceSlope) = $lineFit->varianceOfEstimates();
The Statistics::LineFit module does weighted or unweighted least-squares line fitting to two-dimensional data (y = a + b * x). (This is also called linear regression.) In addition to the slope and y-intercept, the module can return the square of the correlation coefficient (R squared), the Durbin-Watson statistic, the mean squared error, sigma, the t statistics, the variance of the estimates of the slope and y-intercept, the predicted y values and the residuals of the y values. (See the METHODS section for a description of these statistics.)
The module accepts input data in separate x and y arrays or a single 2-D array (an array of arrayrefs). The optional weights are input in a separate array. The module can optionally verify that the input data and weights are valid numbers. If weights are input, the line fit minimizes the weighted sum of the squared errors and the following statistics are weighted: the correlation coefficient, the Durbin-Watson statistic, the mean squared error, sigma and the t statistics.
The module is state-oriented and caches its results. Once you call the setData() method, you can call the other methods in any order or call a method several times without invoking redundant calculations. After calling setData(), you can modify the input data or weights without affecting the modules results.
The decision to use or not use weighting could be made using your a priori knowledge of the data or using supplemental data. If the data is sparse or contains non-random noise, weighting can degrade the solution. Weighting is a good option if some points are suspect or less relevant (e.g., older terms in a time series, points that are known to have more noise).
Download (0.024MB)
Added: 2007-07-12 License: Perl Artistic License Price:
835 downloads
Statistics::Forecast 0.3
Statistics::Forecast is a Perl module that calculates a future value. more>>
Statistics::Forecast is a Perl module that calculates a future value.
This is a dummy Oriented Object module that calculates a future value by using existing values. The new value is calculated by using linear regression.
SYNOPSIS
use Statistics::Forecast;
Create forecast object
my $FCAST = Statistics::Forecast->new("My Forecast Name");
Add data
$FCAST->{DataX} = @Array_X;
$FCAST->{DataY} = @Array_Y;
$FCAST->{NextX} = $NextX;
Calculate the result
$FCAST->calc;
Get the result
my $Result_Forecast = $FCAST->{ForecastY);
INTERNALS
The equation for Forecast is:
a+bx, where x is the predicted value and
_ _
a = y + bx
b = sum((x+x)(y-y))/sum(x-x)**2
METHODS
new
Receives a forecast name, only to remember and returns the blessed data structure as a Statistics::Forecast object.
my $FCAST = Statistics::Forecast->new("My Forecast");
calc
Calculate and return the forecast value.
$FCAST->calc;
dump
Prints data for debuging propose.
$FCAST->dump;
SumX
Returns the sum of X values.
my $SumOfX = $FCAST->{SumX};
SumY
Returns the sum of Y values.
my $SumOfY = $FCAST->{SumY};
SumXX
Returns the sum of X**2 values.
my $SumOfXX = $FCAST->{SumXX};
SumXY
Returns the sum of X * Y values.
my $SumOfXY = $FCAST->{SumXY};
AvgX
Returns the average of X values.
my $AvgX = $FCAST->{AvgX};
AvgY
Returns the average of Y values.
my $AvgY = $FCAST->{AvgY};
N
Return the number of X values.
my $N = $FCAST->{N};
EXAMPLE
use Statistics::Forecast;
my @Y = (1,3,7,12);
my @X = (1,2,3,4);
my $FCAST = Statistics::Forecast->new("My Forecast");
$FCAST->{DataX} = @X;
$FCAST->{DataY} = @Y;
$FCAST->{NextX} = 8;
$FCAST->calc;
print "The Forecast ", $FCAST->{ForecastName};
print " has the forecast value: ", $FCAST->{ForecastY}, "n";
<<lessThis is a dummy Oriented Object module that calculates a future value by using existing values. The new value is calculated by using linear regression.
SYNOPSIS
use Statistics::Forecast;
Create forecast object
my $FCAST = Statistics::Forecast->new("My Forecast Name");
Add data
$FCAST->{DataX} = @Array_X;
$FCAST->{DataY} = @Array_Y;
$FCAST->{NextX} = $NextX;
Calculate the result
$FCAST->calc;
Get the result
my $Result_Forecast = $FCAST->{ForecastY);
INTERNALS
The equation for Forecast is:
a+bx, where x is the predicted value and
_ _
a = y + bx
b = sum((x+x)(y-y))/sum(x-x)**2
METHODS
new
Receives a forecast name, only to remember and returns the blessed data structure as a Statistics::Forecast object.
my $FCAST = Statistics::Forecast->new("My Forecast");
calc
Calculate and return the forecast value.
$FCAST->calc;
dump
Prints data for debuging propose.
$FCAST->dump;
SumX
Returns the sum of X values.
my $SumOfX = $FCAST->{SumX};
SumY
Returns the sum of Y values.
my $SumOfY = $FCAST->{SumY};
SumXX
Returns the sum of X**2 values.
my $SumOfXX = $FCAST->{SumXX};
SumXY
Returns the sum of X * Y values.
my $SumOfXY = $FCAST->{SumXY};
AvgX
Returns the average of X values.
my $AvgX = $FCAST->{AvgX};
AvgY
Returns the average of Y values.
my $AvgY = $FCAST->{AvgY};
N
Return the number of X values.
my $N = $FCAST->{N};
EXAMPLE
use Statistics::Forecast;
my @Y = (1,3,7,12);
my @X = (1,2,3,4);
my $FCAST = Statistics::Forecast->new("My Forecast");
$FCAST->{DataX} = @X;
$FCAST->{DataY} = @Y;
$FCAST->{NextX} = 8;
$FCAST->calc;
print "The Forecast ", $FCAST->{ForecastName};
print " has the forecast value: ", $FCAST->{ForecastY}, "n";
Download (0.003MB)
Added: 2007-05-23 License: Perl Artistic License Price:
887 downloads
Statistics::Hartigan 0.01
Statistics::Hartigan is a Perl extension for the stopping rule proposed by Hartigan J. Hartigan, J. (1975). more>>
Statistics::Hartigan is a Perl extension for the stopping rule proposed by Hartigan J. Hartigan, J. (1975). Clustering Algorithms. John Wiley and Sons, New York, NY, US.
SYNOPSIS
use Statistics::Hartigan;
&hartigan(InputFile, "agglo", 6, 10);
Input file is expected in the "dense" format -
Sample Input file:
6 5
1 1 0 0 1
1 0 0 0 0
1 1 0 0 1
1 1 0 0 1
1 0 0 0 1
1 1 0 0 1
Hartigan J. uses the Within Cluster/Group Sum of Squares (WGSS) to estimate the number of clusters a given data naturally falls into. The is goal is to minimize WG.
<<lessSYNOPSIS
use Statistics::Hartigan;
&hartigan(InputFile, "agglo", 6, 10);
Input file is expected in the "dense" format -
Sample Input file:
6 5
1 1 0 0 1
1 0 0 0 0
1 1 0 0 1
1 1 0 0 1
1 0 0 0 1
1 1 0 0 1
Hartigan J. uses the Within Cluster/Group Sum of Squares (WGSS) to estimate the number of clusters a given data naturally falls into. The is goal is to minimize WG.
Download (0.006MB)
Added: 2007-05-23 License: Perl Artistic License Price:
884 downloads
Statistics::ChiSquare 0.5
Statistics::ChiSquare - How well-distributed is your data? more>>
Statistics::ChiSquare - How well-distributed is your data?
SYNOPSIS
use Statistics::Chisquare;
print chisquare(@array_of_numbers);
Statistics::ChiSquare is available at a CPAN site near you.
Suppose you flip a coin 100 times, and it turns up heads 70 times. Is the coin fair?
Suppose you roll a die 100 times, and it shows 30 sixes. Is the die loaded?
In statistics, the chi-square test calculates how well a series of numbers fits a distribution. In this module, we only test for whether results fit an even distribution. It doesnt simply say "yes" or "no". Instead, it gives you a confidence interval, which sets upper and lower bounds on the likelihood that the variation in your data is due to chance. See the examples below.
If youve ever studied elementary genetics, youve probably heard about Georg Mendel. He was a wacky Austrian botanist who discovered (in 1865) that traits could be inherited in a predictable fashion. He did lots of experiments with cross breeding peas: green peas, yellow peas, smooth peas, wrinkled peas. A veritable Brave New World of legumes.
But Mendel faked his data. A statistician by the name of R. A. Fisher used the chi-square test to prove it.
Theres just one function in this module: chisquare(). Instead of returning the bounds on the confidence interval in a tidy little two-element array, it returns an English string. This was a deliberate design choice---many people misinterpret chi-square results, and the string helps clarify the meaning.
The string returned by chisquare() will always match one of these patterns:
"Theres a >d+% chance, and a br bEXAMPLES/b br br Imagine a coin flipped 1000 times. The expected outcome is 500 heads and 500 tails: br br @coin = (500, 500); br print chisquare(@coin); br br prints "Theres a >90% chance, and a<<less
SYNOPSIS
use Statistics::Chisquare;
print chisquare(@array_of_numbers);
Statistics::ChiSquare is available at a CPAN site near you.
Suppose you flip a coin 100 times, and it turns up heads 70 times. Is the coin fair?
Suppose you roll a die 100 times, and it shows 30 sixes. Is the die loaded?
In statistics, the chi-square test calculates how well a series of numbers fits a distribution. In this module, we only test for whether results fit an even distribution. It doesnt simply say "yes" or "no". Instead, it gives you a confidence interval, which sets upper and lower bounds on the likelihood that the variation in your data is due to chance. See the examples below.
If youve ever studied elementary genetics, youve probably heard about Georg Mendel. He was a wacky Austrian botanist who discovered (in 1865) that traits could be inherited in a predictable fashion. He did lots of experiments with cross breeding peas: green peas, yellow peas, smooth peas, wrinkled peas. A veritable Brave New World of legumes.
But Mendel faked his data. A statistician by the name of R. A. Fisher used the chi-square test to prove it.
Theres just one function in this module: chisquare(). Instead of returning the bounds on the confidence interval in a tidy little two-element array, it returns an English string. This was a deliberate design choice---many people misinterpret chi-square results, and the string helps clarify the meaning.
The string returned by chisquare() will always match one of these patterns:
"Theres a >d+% chance, and a br bEXAMPLES/b br br Imagine a coin flipped 1000 times. The expected outcome is 500 heads and 500 tails: br br @coin = (500, 500); br print chisquare(@coin); br br prints "Theres a >90% chance, and a<<less
Download (0.005MB)
Added: 2007-05-22 License: Perl Artistic License Price:
889 downloads
Statistics::ChisqIndep 0.1
Statistics::ChisqIndep is a Perl module to perform chi-square test of independence (a.k.a. contingency tables). more>>
Statistics::ChisqIndep is a Perl module to perform chi-square test of independence (a.k.a. contingency tables).
Synopsis
#example for Statistics::ChisqIndep
use strict;
use Statistics::ChisqIndep;
use POSIX;
# input data in the form of the array of array references
my @obs = ([15, 68, 83], [23,47,65]);
my $chi = new Statistics::ChisqIndep;
$chi->load_data(@obs);
# print the summary data along with the contingency table
$chi->print_summary();
#print the contingency table only
$chi->print_contingency_table();
#the following output is the same as calling the function of print_summary
#all of the detailed info such as the expected values, degree of freedoms
#and totals are accessible as object globals
#check if the load_data() call is successful
if($chi->{valid}) {
print "Rows: ", $chi->{rows}, "n";
print "Columns: ", $chi->{cols}, "n";
print "Degree of Freedom: ", $chi->{df}, "n";
print "Total Count: ", $chi->{total}, "n";
print "Chi-square Statistic: ",
$chi->{chisq_statistic}, "n";
print "p-value: ", $chi->{p_value}, "n";
print "Warning:
some of the cell counts might be too low.n"
if ($chi->{warning});
#output the contingency table
my $rows = $chi->{rows}; # # rows
my $cols = $chi->{cols}; # # columns
my $obs = $chi->{obs}; # observed values
my $exp = $chi->{expected}; # expected values
my $rtotals = $chi->{rtotals}; # row totals
my $ctotals = $chi->{ctotals}; #column totals
my $total = $chi->{total}; # total counts
for (my $j = 0; $j < $cols; $j++) {
print "t",$j + 1;
}
print "trtotaln";
for (my $i = 0; $i < $rows; $i ++) {
print $i + 1, "t";
for(my $j = 0 ; $j < $cols; $j ++) {
#observed values can be accessed
#in the following way
print $obs->[$i]->[$j], "t";
}
#row totals can be accessed
# in the following way
print $rtotals->[$i], "n";
print "t";
for(my $j = 0 ; $j < $cols; $j ++) {
#expected values can be accessed
#in the following way
printf "(%.2f)t", $exp->[$i]->[$j];
}
print "n";
}
print "ctotalt";
for (my $j = 0; $j < $cols; $j++) {
#column totals can be accessed in the following way
print $ctotals->[$j], "t";
}
#output total counts
print $total, "n";
}
This is the module to perform the Pearsons Chi-squared test on contingency tables of 2 dimensions. The users input the observed values in the table form and the module will compute the expected values for each cell based on the independence hypothesis. The module will then compute the chi-square statistic and the corresponding p-value based on the observed and the expected values to test if the 2 dimensions are truly independent.
<<lessSynopsis
#example for Statistics::ChisqIndep
use strict;
use Statistics::ChisqIndep;
use POSIX;
# input data in the form of the array of array references
my @obs = ([15, 68, 83], [23,47,65]);
my $chi = new Statistics::ChisqIndep;
$chi->load_data(@obs);
# print the summary data along with the contingency table
$chi->print_summary();
#print the contingency table only
$chi->print_contingency_table();
#the following output is the same as calling the function of print_summary
#all of the detailed info such as the expected values, degree of freedoms
#and totals are accessible as object globals
#check if the load_data() call is successful
if($chi->{valid}) {
print "Rows: ", $chi->{rows}, "n";
print "Columns: ", $chi->{cols}, "n";
print "Degree of Freedom: ", $chi->{df}, "n";
print "Total Count: ", $chi->{total}, "n";
print "Chi-square Statistic: ",
$chi->{chisq_statistic}, "n";
print "p-value: ", $chi->{p_value}, "n";
print "Warning:
some of the cell counts might be too low.n"
if ($chi->{warning});
#output the contingency table
my $rows = $chi->{rows}; # # rows
my $cols = $chi->{cols}; # # columns
my $obs = $chi->{obs}; # observed values
my $exp = $chi->{expected}; # expected values
my $rtotals = $chi->{rtotals}; # row totals
my $ctotals = $chi->{ctotals}; #column totals
my $total = $chi->{total}; # total counts
for (my $j = 0; $j < $cols; $j++) {
print "t",$j + 1;
}
print "trtotaln";
for (my $i = 0; $i < $rows; $i ++) {
print $i + 1, "t";
for(my $j = 0 ; $j < $cols; $j ++) {
#observed values can be accessed
#in the following way
print $obs->[$i]->[$j], "t";
}
#row totals can be accessed
# in the following way
print $rtotals->[$i], "n";
print "t";
for(my $j = 0 ; $j < $cols; $j ++) {
#expected values can be accessed
#in the following way
printf "(%.2f)t", $exp->[$i]->[$j];
}
print "n";
}
print "ctotalt";
for (my $j = 0; $j < $cols; $j++) {
#column totals can be accessed in the following way
print $ctotals->[$j], "t";
}
#output total counts
print $total, "n";
}
This is the module to perform the Pearsons Chi-squared test on contingency tables of 2 dimensions. The users input the observed values in the table form and the module will compute the expected values for each cell based on the independence hypothesis. The module will then compute the chi-square statistic and the corresponding p-value based on the observed and the expected values to test if the 2 dimensions are truly independent.
Download (0.003MB)
Added: 2006-12-18 License: Perl Artistic License Price:
1040 downloads
Statistics::MaxEntropy 0.9
MaxEntropy is a Perl5 module for Maximum Entropy Modeling and Feature Induction. more>>
MaxEntropy is a Perl5 module for Maximum Entropy Modeling and Feature Induction.
SYNOPSIS
use Statistics::MaxEntropy;
# debugging messages; default 0
$Statistics::MaxEntropy::debug = 0;
# maximum number of iterations for IIS; default 100
$Statistics::MaxEntropy::NEWTON_max_it = 100;
# minimal distance between new and old x for Newtons method;
# default 0.001
$Statistics::MaxEntropy::NEWTON_min = 0.001;
# maximum number of iterations for Newtons method; default 100
$Statistics::MaxEntropy::KL_max_it = 100;
# minimal distance between new and old x; default 0.001
$Statistics::MaxEntropy::KL_min = 0.001;
# the size of Monte Carlo samples; default 1000
$Statistics::MaxEntropy::SAMPLE_size = 1000;
# creation of a new event space from an events file
$events = Statistics::MaxEntropy::new($file);
# Generalised Iterative Scaling, "corpus" means no sampling
$events->scale("corpus", "gis");
# Improved Iterative Scaling, "mc" means Monte Carlo sampling
$events->scale("mc", "iis");
# Feature Induction algorithm, also see Statistics::Candidates POD
$candidates = Statistics::Candidates->new($candidates_file);
$events->fi("iis", $candidates, $nr_to_add, "mc");
# writing new events, candidates, and parameters files
$events->write($some_other_file);
$events->write_parameters($file);
$events->write_parameters_with_names($file);
# dump/undump the event space to/from a file
$events->dump($file);
$events->undump($file);
This module is an implementation of the Generalised and Improved Iterative Scaling (GIS, IIS) algorithms and the Feature Induction (FI) algorithm as defined in (Darroch and Ratcliff 1972) and (Della Pietra et al. 1997). The purpose of the scaling algorithms is to find the maximum entropy distribution given a set of events and (optionally) an initial distribution.
Also a set of candidate features may be specified; then the FI algorithm may be applied to find and add the candidate feature(s) that give the largest `gain in terms of Kullback Leibler divergence when it is added to the current set of features.
Events are specified in terms of a set of feature functions (properties) f_1...f_k that map each event to {0,1}: an event is a string of bits. In addition of each event its frequency is given. We assume the event space to have a probability distribution that can be described by
The module requires the Bit::SparseVector module by Steffen Beyer and the Data::Dumper module by Gurusamy Sarathy. Both can be obtained from CPAN just like this module.
<<lessSYNOPSIS
use Statistics::MaxEntropy;
# debugging messages; default 0
$Statistics::MaxEntropy::debug = 0;
# maximum number of iterations for IIS; default 100
$Statistics::MaxEntropy::NEWTON_max_it = 100;
# minimal distance between new and old x for Newtons method;
# default 0.001
$Statistics::MaxEntropy::NEWTON_min = 0.001;
# maximum number of iterations for Newtons method; default 100
$Statistics::MaxEntropy::KL_max_it = 100;
# minimal distance between new and old x; default 0.001
$Statistics::MaxEntropy::KL_min = 0.001;
# the size of Monte Carlo samples; default 1000
$Statistics::MaxEntropy::SAMPLE_size = 1000;
# creation of a new event space from an events file
$events = Statistics::MaxEntropy::new($file);
# Generalised Iterative Scaling, "corpus" means no sampling
$events->scale("corpus", "gis");
# Improved Iterative Scaling, "mc" means Monte Carlo sampling
$events->scale("mc", "iis");
# Feature Induction algorithm, also see Statistics::Candidates POD
$candidates = Statistics::Candidates->new($candidates_file);
$events->fi("iis", $candidates, $nr_to_add, "mc");
# writing new events, candidates, and parameters files
$events->write($some_other_file);
$events->write_parameters($file);
$events->write_parameters_with_names($file);
# dump/undump the event space to/from a file
$events->dump($file);
$events->undump($file);
This module is an implementation of the Generalised and Improved Iterative Scaling (GIS, IIS) algorithms and the Feature Induction (FI) algorithm as defined in (Darroch and Ratcliff 1972) and (Della Pietra et al. 1997). The purpose of the scaling algorithms is to find the maximum entropy distribution given a set of events and (optionally) an initial distribution.
Also a set of candidate features may be specified; then the FI algorithm may be applied to find and add the candidate feature(s) that give the largest `gain in terms of Kullback Leibler divergence when it is added to the current set of features.
Events are specified in terms of a set of feature functions (properties) f_1...f_k that map each event to {0,1}: an event is a string of bits. In addition of each event its frequency is given. We assume the event space to have a probability distribution that can be described by
The module requires the Bit::SparseVector module by Steffen Beyer and the Data::Dumper module by Gurusamy Sarathy. Both can be obtained from CPAN just like this module.
Download (0.041MB)
Added: 2007-05-23 License: GPL (GNU General Public License) Price:
886 downloads
Secleted [ 0 ] software to compare
Copyright Notice:
Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future software development. The above statistic search only lists software in full, demo and trial versions for free download. Download links are directly from our mirror sites or publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed