專案PPT - cookie10142008/tsmc_intern GitHub Wiki

  • Anomaly Diagnosis of Database Performance by Machine Learning
  • F15MITD 2019/08/26
  • Background & Benefits of SQL Performance Shift Detection
  • Intern Objective for Solution of Abnormal Pattern
  • Construct an efficient way to identify the SQL causing anomaly
  • Find out the abnormal patterns in CIMDB
  • Find out the relationship between SQL and CPU usage
  • Find out which SQL causing abnormal pattern
  • 5 Steps to Implement
  • Advisement for TSMC
  • Intern in TSMC Ba* ckground
  • Currently 300mm FAB production line are depending on IT system fully automation, these huge LOT processed data and system transaction need to rely on lots of I/O to Oracle/DB2 database.
  • But…
  • inappropriate AP logic and bad SQL performance may cause highly I/O usage and CPU trend up issue
  • sharply slow down the FAB Track in/out frequency.
  • current abnormal CPU usage are taking manual monitor without preventing previously
  • substantially take time and human power to fix/find out root causes.
  1. Early detect abnormal patterns in CIMDB
  2. Detect whether the abnormal pattern is related to SQL or not
  3. Find out which SQL causing abnormal pattern
  • Step 1: Identify Abnormal Pattern

  • Automate way of finding peak in CPU usage charts

  • Step 2: Feature Engineering

  • Enhance referenced data/parameters for machine learning detection model

  • Step 3: Feature Selection

  • Select key features for ML detection model

  • Step 4: Model Selection

  • Build different ML detection models to find out

  • which SQL causing Peak

  • Step 5: Model Evaluation

  • Evaluate the accuracy of ML detection model

  • based on the peak found

  • Concept:

    • predefine threshold and if CPU usage is over it, see it as peak
  • Problem:

    • spend time on predefining threshold
    • DB trend may change
  • Concept:

    • use K-means ML algorithm to split dataset into peak cluster and non-peak cluster
  • Advantage:

    • ML help to automatically define threshold and identify peak
    • adjust threshold whenever we change our needs
  • Task2:Pre-task of feature engineering

  • Purpose: Collect more features for ML model as reference to identify SQL causing abnormal pattern

  • Method:

  • Remove the noise (like empty values that affect the judgement of ML model )

  • Data type transform (transform non-numeric type value(e.g., letters) into numeric type that machine can recognize)

  • Join table (Combine SQL performance table & Frequency of SQL usage table)

  • Benefit: Improve the accuracy of ML model

  • Task2:Check whether SQL have something to do with CPU usage or not (calculate Pearson Correlation)

  • Purpose: Select key features making the detection model more precisely

  • Method: PCA(Principal components analysis) (features: 9987 => 120)

  • Result: - cannot improve the accuracy of ML detection model

  • Purpose: Build model with previously processed parameters to identify which SQL causing abnormal pattern

  • Method: Refer to “sk-learn” model selection and validation result, try appropriate classification model and choose the better one

  • Benefit: Get the better model to make more precise result

  • Criteria of evaluation: Hit rate / False alarm rate

  • Task3:Find out which SQL causing abnormal pattern

  • (Hit rate/False alarm rate)

  • SVM:0.5/0.16

  • KNN:0/0.06

  • AdaBoost: 0.36/0.07

  • Gradient Boosting:0.63/0.036

  • Ensemble-XGBoost:0.909/0.117

  • About Project

  • Inspiration of current methodology

  • Collect more DB data from Hsinchu branch to predict which combination of SQL usage will cause peak in the future

  • E.g., simultaneously using A,B,C 3 SQL will definitely cause peak

  • About Environment

  • Set metal gate at the entrance before current metal gates

  •         avoid unconsciously bringing cellphone, laptop
    
  • Easy way to get open learning resources & analysis tools of ML

  • Only teamwork can finally reach the goal

  • Everyone connects, nobody can make mistakes

  • Only keeping corporate’s intelligence confidential can maintain competitive advantages