Multicollineairty 모델 구축 및 성능 평가 - ISEL-HGU/IntegratedDPModel GitHub Wiki

툴 소개

다중공선성 논문의 실험을 재현할 수 있다. 데이터셋이 다중공선성을 가질 때 결점 예측 모델 성능 저하가 발생하는지 여부를 검사하기 위해 다중공선성 제거 기술을 사용하거나 사용하지 않는 11가지 유형의 모델을 설정한다. None, Default-PCA, NSVIF10, NSVIF5, NSVIF4, NSVIF2.5, SVIF10, SVIF5, SVIF4, SVIF2.5 및 VCRR 모델을 만들어 성능을 평가한다.

Manual script steps

  1. git clone https://github.com/ISEL-HGU/MulticollinearityExpTool.git
  2. gradle distzip
  3. unzip build/distributions/MulticollinearityExpTool.zip
  4. cd MulticollinearityExpTool/bin/ 하기. 그 아래 multisearch_None_PCA_VIF_Eval.shmultisearch_VCRR_Eval.sh 실행시킨다.
    1. 이 스크립트는 파라미터튜닝을 적용시킨 모델을 만드는 스크립트이다 (-e, -u 그리고 -v 옵션).

Options

usage: MulticollinearityExpTool -c <csv file location> -d <data unbalancing mode> [-e
       <MultiSearch Evaluation option>] -f <the number of cross-validation
       folds> [-h] -i <number of cross-validation iterations> -m <machine
       learning model> -o <path> -p <thread pool size> -s <file> -t
       <attribute value> [-u <parameter tuning option>] [-v <flag of
       parameter tuning>]
Multicollineaity paper experiment tool
 -c,--csv <csv file location>                        file path of output
                                                     to output file.
 -d,--dataUnbalancingMode <data unbalancing mode>    1 is noHandling data
                                                     unbalance or 2 is
                                                     applying spread
                                                     subsampling or 3 is
                                                     applying smote
 -e,--evaluation <MultiSearch Evaluation option>     1 is AUC or 2 is
                                                     Fmeasure or 3 is MCC
                                                     or 4 is Precision or
                                                     5 is Recall
 -f,--fold <the number of cross-validation folds>    the number of
                                                     cross-validation
                                                     folds
 -h,--help                                           Help
 -i,--iter <number of cross-validation iterations>   number of
                                                     cross-validation
                                                     iterations
 -m,--model <machine learning model>                 machine learning
                                                     model
 -o,--originaldata <path>                            path to original data
                                                     before creating
                                                     cross-validation data
 -p,--pool <thread pool size>                        thread pool size
 -s,--source <file>                                  source arff file path
                                                     to train a prediction
                                                     model
 -t,--type <attribute value>                         1 is a original
                                                     dataset or applying
                                                     PCA or VIF to remove
                                                     multicollinearity or
                                                     2 is applying
                                                     Correlation-based
                                                     feature selection or
                                                     3 is applying
                                                     Wrapper-based feature
                                                     selection or 4 is
                                                     applying Variable
                                                     clustering and
                                                     removing redundant
                                                     metrics.
 -u,--tuning <parameter tuning option>               parameter tuning
                                                     option. 1 is
                                                     GridSearch or 2 is
                                                     CVParameterSelection
                                                     or 3 is MultiSearch
 -v,--tuningflag <flag of parameter tuning>          true or false

⚠️ **GitHub.com Fallback** ⚠️