########## Version 0.10.1 (December 2009) ################### * A new criterion for variable selection in a generalized linear model context: the Wald criterion. Files wald.R (in /R) and wald.Rd (in /man) were added. * All search algorithm files (anneal.R, genetic.R, improve.R and leaps.R) were changed to cater for the Wald criterion. * All algorithm man files were changed to account for the Wald criterion. * The default criterion has been changed to "default" and is set to "RM" if r=0 and to "TAU2" if r>0. * Changes were made to the validation.R file, to cater for the Wald criterion. * For compatibility with the output of other search functions, "drop=FALSE" was added to the subsetting of the output of the "genetic" function, when "kabort" is involved. * Changes were made to ensure that the C++ code follwoed the standards. Problems had arisen with gcc 4.4 and the SunStudio compilers. ########## Version 0.9.9993 (January 2009) ########### Changes in the C++ code which broke the package. ("#include " added in sscma.cpp) ########## Version 0.9.9992 (February 2007) ########### Changes in the Fortran code which broke the package on Linux 64-bit machines (e.g.: conversions between types were smoothed out). ########## Version 0.9.9991 (January 2007) ########### Changes in the C code which could break the package on Linux 64-bit machines (e.g.: use of "long" data type was changed to "int"). ########## Version 0.9.999 (October 2006) ############ 1) Changes in the C code required by the transition to R 2.4 (and the use of new C compiler). 2) The leaps.R, lmHmat.default and glhHmat.default functions have suffered minor changes for greater numerical stability. 3) The "qr" criterion used when checking for positive definiteness of the input matrix in the validmat.R function was removed for coherence with the use of the "tolval" argument. ########## Version 0.9.99 (June 2006) ############ 1) Three new functions have been added to "pre-process" the data for submission to the search algorithms using any of the four new "multivariate linear hypothesis" criteria: function \code{lmHmat} creates the input for a linear model (linear regression), function \code{ldaHmat} does the same for a linear discriminant analysis and \code{glhHmat} does likewise for the general linear hypothesis. 2) The \code{call} item in the output list of the \code{leaps} function (which had been mistakenly dropped in version 0.9-9) was re-added. ########## Version 0.9.9 (May 2006) ############ Some major changes, as well as some minor ones. 1) Four new criteria have been added: "CCR1_2", "TAU_2", "XI_2" and "ZETA_2". All are criteria for subset selection in the context of the multivariate linear model. The four criteria were added as options for the "criterion" argument in the subset selection algorithms (anneal, genetic, improve and leaps). 2) Four new functions were created to compute the value of the new criteria when given a total matrix, a model effects matrix (H) with its expected rank (r) and a subset of variables. These new functions are called "ccr12.coef", "tau2.coef", "xi2.coef" and "zeta2.coef", and join the similar functions "gcd.coef", "rm.coef" and "rv.coef" which already existed. 3) Two new auxiliary functions were created: "validmat" and "validnovcrit", which include validation checks for the symmetry and positive definiteness of the total matrix and for the suitability of the H and r arguments of the new criteria, respectively. These are not to be called directly by the user. 4) The validation part of the old and new functions has been further modularized, with calls to the new validation functions mentioned above. 5) A new argument "tolsym", with default value 1000*.Machine$double.eps, has been added to the four search functions (anneal, genetic, improve and leaps). It is used in the symmetry checks of the total and effects (H) matrices. When the supplied matrices have corresponding elements differing by more than this value, they are rejected. If corresponding elements differ, but by less than this value, the supplied matrix is replaced by its symmetric part and a warning is provided. 6) Man files were written for the new functions, and the help files for the search functions were improved, with examples using the new criteria. ########## Version 0.9.1 (June 2005) ############# 1) The default value of argument "tolval" in functions "trim.matrix", "leaps", "anneal", "genetic" and "improve" has been changed to 10*.Machine$double.eps (previously .Machine$double.eps). 2) Correction of minor mistakes in the "leaps" help page and in function "validation" (which is not in the Namespace) - changes made to the "leaps" function between versions 0.8 and 0.9 had not been updated in this function and help page. ########## Version 0.9 (March 2005) ############# 1) Namespace added. 2) A new function "trim.matrix" was introduced, to deal with the issue of multicollinearities in the input data. 3) The four search functions ("leaps", "anneal", "genetic" and "improve") now perform a test for multicollinearity, which consists in comparing the ratio of the smallest to the largest eigenvalue of the input covariance/correlation matrix with the new argument "tolval" (.Machine$double.eps, by default). If this ratio is smaller than "tolval", the function exits with a message to the user requiring that the input matrix be cleared of its ill-conditionment (perhaps through the use of the new function "trim.matrix") before resubmitting it to the search function. The use of the search functions can be forced, by lowering the value of "tolval" (negative values of "tolval" are not allowed), but with risks of numerical inaccuracy. 4) The previous maximum number of variables allowed for the search functions (300) was eliminated. The code now runs for any number of variables. A test for input data with more than 400 variables was introduced in the search functions, and when that condition tests positive, the function stops execution and passes a message to the user, who may choose to run the function anyways, by setting a new function argument ("force") to TRUE. The code will now run with any number of variables, but may crash the R session due to memory problems. 5) Non-standard C++ code in the routines was changed and code which caused memory errors was also modified. The new code passes the "valgrind" memory checks. 6) Some Fortran code which did not comply with Fortran 77 standards was changed (e.g.: symbols such as ">" changed to ".GT."; comments using "*" in the first column; etc.). Some non-standard code was kept (use of variable array sizes in subroutine declaratives; use of do-loops and do-while-loops). 7) The default behaviour when a non-square matrix is passed as input data to the search functions has been changed: instead of assuming that covariance matrix of the input data matrix was wanted, the new option is to compute the input data matrix's *correlation* matrix. 8) The default values for "pcindices" whas been changed in the "leaps" function and is now "first_k" as in the "anneal", "genetic" and "improve" functions. This change is possible due to changes in the "leaps" source code. ########## Version 0.8 (May 2004) ############## 1) Output lists for the four search functions (leaps, anneal, genetic and improve) now have a fifth object: "$call", which gives the match.call() of the command that produced the output. 2) The C++ routine to compute the eigendecomposition for the "GCD" criterion in the "leaps" function has been improved. A more exact routine based on the QR decomposition is now used (see file "qldiagon.cpp" in the /src subdirectory). 3) Validation of input for the four search functions has now been transfered to separate modules ("validation.R", "validannimp.R" and "validgenetic.R"), which are mentioned in "/man/subselect-internal.Rd". Some minor changes to these validation checks were made. 4) The default values for "pcindices" were changed in the "leaps" function (where it is now NULL) and in the "anneal", "genetic" and "improve" functions (where it is now "first_k"). This seeks to highlight the behaviour of the functions when the GCD criterion (which uses "pcindices") is invoked with the default value of pcindices: "leaps" explicitely requires a set of pcindices, whereas the other three search functions will, by default, compare, for each cardinality "k" that has been requested, the k-variable subsets with the first k PCs. ########## Version 0.7.1 (March 2004) ############## 1) Two lines defining constants "TRUE" and "FALSE" in the C++ file src/matrixb.h, which gave errors when compiling in Mac OS X, were changed. The same change was made in file SR_vsda.h, and similar changes with the constants "EPSLON" were made in files lagmat.cpp, gaussjel.cpp, gausstrg.cpp, matfact.cpp, pdiagon.cpp and simgaussjell.cpp. Constants "MIN" and "MAX" in files SR_vsda.h, sr_sscma.cpp and sr_wrkf1.cpp were renamed MINIMZ and MAXIMZ, respectively. 2) Warnings in the R code and help files were introduced, to caution against the difference in default behaviour of the "leaps" function - when compared to the search algorithm functions "anneal", "genetic" and "improve" - in case the "GCD" criterion is requested. In the latter functions, GCD values compare (by default) the subspaces spanned by k-variable subsets and by the first k PCs. In the leaps function, GCD values, by default, compare the subspaces spanned by k-variable subsets and by the first *kmin* PCs (i.e., the cardinalities of the subsets of variables and PCs are not equal, except for the first cardinality requested). This option is directly related to the nature of the leaps algorithm. ########## Version 0.7 (March 2004) ############## 1) A major new function has been included: "leaps". Leaps implements Duarte Silva's adaptation of Furnival and Wilson's Leaps and Bounds Algorithm for variable selection in Regression Analysis. The bulk of computations are carried out by C++ routines. 2) New checks for symmetry and positive-definiteness of the input matrix were added. 3) The references in the help files were updated. ########## Version 0.4.1 (February 2004) ######### 1) Changed file gcd.Rd, which had old default option for the argument pcindices. This generated a codoc mismatch. ########## Version 0.4 (April 2003) ############## 1) Version 1.7 of R includes LAPACK. The configure scripts to check for the presence of LAPACK in the system have been deleted. 2) Argument "initialsol" in functions "anneal" and "improve", as well as argument "initialpop" in function "genetic" are now NULL by default. The validations and initializations have been changed as a result. 3) Argument "pcindices" in function "gcd.coef" is initially NULL and will be set to 1:kmax unless the user has specified a set of PC indices. 4) Validation tests in functions "gcd.coef", "rm.coef" and "rv.coef" to check whether user-specified variable (and or PC) indices are integers. 5) Validation tests in functions "anneal", "improve" and "genetic" to test whether inital solutions or population are specified as arrays of integers. 6) A bug was fixed in functions "anneal", "genetic" and "improve" in those cases when argument "pcindices" is defined by the user. A logical comparison between a vector "pcindices" and another vector was being treated as if a global value for equality were returned. ########## Version 0.3 (October 2002) ############## 1) Changes to the configure scripts necessary for configuration in the MacOSX. 2) Functions anneal, genetic and improve now include checks for covariance/correlation matrices not of full rank. If non-full rank matrices are given, the functions exit with a warning. ########## Version 0.2 (June 2002) ################# The following changes were made in Version 0.2 of R package subselect: 1) A bug was corrected in function anneal: when more than one solution (nsol > 1) was requested, the initial temperature for subsequent solutions was not being updated. 2) The Fortran code was changed to use R's default random number generator, introducing a call to R as described in pages 48 and 50-51 of the "Writing R Extensions" manual, version 1.5. As a consequence, the LAPACK routine SLARUV is no longer used and the "iseed" argument of functions "anneal", "genetic" and "improve" has been replaced by the logical argument "setseed". 3) Initial solutions (variable subsets) may now be fed by the user to each of functions "anneal" (argument "initialsol"), "genetic" (argument "initialpop") and "improve" (argument "initialsol"). 4) The Fortran code was changed to pass messages directly to the R GUI (which did not occur in the Windows version), as described in page 50 of the "Writing R Extensions" manual, version 1.5. 5) The "indices" argument of functions "gcd.coef", "rm.coef" and "rv.coef" can now be 2-d or 3-d arrays. In this way, the "$subsets" and "$bestsets" output options of functions "anneal", "genetic" and "improve" can be passed directly to functions "rm.coef", "rv.coef" and "gcd.coef", allowing easy computation of values of any criteria for subsets produced by the search algorithms using a different criterion. 6) The "printfile" option of functions "anneal", "genetic" and "improve" has been dropped, as printing to files is best done using the R environment. 7) Function "anneal" has a new argument, "coolfreq", which controls the frequency with which the simulated annealing temperature is cooled (used to be hard-coded as every 20 iterations in the Fortran code of version 0.1). 8) Function "genetic" has a new argument "mutprob", which controls the probability of each offspring in the genetic algorithm undergoing a mutation (i.e., being fed to the restricted local improvement algorithm), if the "mutate" option is set to TRUE (previously *all* children were mutated, which considerably slowed down run times). 9) Function "full.k.search" has been dropped from the package. It was computationally inefficient and not in the general spirit of the package. For computationally efficient full-search algorithms, see program SSCMA by Pedro Duarte Silva of the Universidade Catolica do Porto, available at http://porto.ucp.pt/psilva 10) The help files were updated accordingly. Details of the "rv.coef" and "rm.coef" functions were also introduced.