ExcelTemplateToGuzzleJobConfigYAML_readme - ja-guzzle/guzzle_docs GitHub Wiki
- Purpose
- Sample template
- Run the app
- Arguments
- Configurations
- Output - Job Config YAML
- Limitations
- Recommendation
- Notes
Goal of this application is to prepare Guzzle job(only processing job, for now) config yaml files from provided excel files.
This application accepts,
- Configuration file (which is in yml file which gives diff information on how to locate diff aspects in the the actual excle)
- Excel file from which job config will be prepared
And Produces,
- job config YAML files
- logs
STTM_Processing_Excel_Template.xlsx
java -jar ExcelTemplateToGuzzleJobConfigYAML.jar "excel=</absolute/path/to/excel.xlsx>" "conf=</absolute/path/to/conf.yml>" "output=</absolute/path/to/outputDir>" "log=</absolute/path/to/logDir>"
- * Path of folder which have source and target query files. ["excel=/absolute/path/to/excel.xlsx"]
- Path of Configuration file. ["conf=/absolute/path/to/conf.yml"] --- [Default: will try to pick config.yml named file from 'excel' argument's directory]
- Path of output directory. ["output=/absolute/path/to/outputDir"] --- [Default: 'output' directory under 'excel' argument's path]
- Path of log directory. ["log=/absolute/path/to/logDir"] --- [Default: 'log' directory under 'excel' argument's path]
Only * argument is mandatory, for others default values will be used.
This confirm is very powerful and gives you a lot of flexibility of how you want to setup your ETL template. It does not go by position by labels of the fields.
general:
value_delimiter: "," # to separate ids of additional column and truncate partition columns e.g.: fr_col, add_col
range_delimiter: ".." # can be used to specify id range when ids are integer e.g.: 1..7,8,10..13. Usually we will never have column names which are integer.
clause:
from: "From" # excel cell value which represents 'from' clause
where: "Filter" # excel cell value which represents 'where' clause
join: "JOINS" # excel cell value which represents 'join' clause
order_by: "Order By" # excel cell value which represents 'order by' clause
group_by: "Group By" # excel cell value which represents 'group by' clause
having: "Having Clause" # excel cell value which represents 'having' clause
sheet_detail:
additional_columns:
sheet_name: "Additional Columns" # name of additional column sheet
header_row_index: "0" # index of row (0 based) from which header of 'additional columns' starts
truncate_partition_columns:
sheet_name: "Truncate Partition Columns" # name of truncate partition column sheet
header_row_index: "0" # index of row (0 based) from which header of 'truncate partition columns' starts
column_type:
primary_key: "P" # put the value you have used in sttm sheet to denote the column is primary key
merge_key: "M" # put the value you have used in sttm sheet to denote the column is merge key
history_column: "H" # put the value you have used in sttm sheet to denote the column is history column
fields: # below are necessary fields used in app, provide the mapping (excel cell value) which represents the field
description: "Description"
job_type: "Job Type"
tags: "Tags"
s_endpoint: "Source Endpoint"
s_technology: "Source Technology"
incremental: "Incremental"
s_table: "Source Schema.Source Table"
sql_query: "SQL"
filter: "Filter On Source"
s_pre_sql: "PreSQL"
s_post_sql: "PostSQL"
s_table_dependency: "Source Table Dependancy"
t_table_dependency: "Target Table Dependancy"
additional_columns: "Add Columns"
t_endpoint: "Target Endpoint"
t_technology: "Target Technology"
t_schemaName.tableName: "Target Schema.Target Table"
operation: "Loading Startegy"
template: "Template"
t_pre_sql: "PreSQL"
t_post_sql: "PostSQL"
soft_delete: "Soft Delete"
truncate_partition_columns: "Truncate Clumn"
sttm_columns:
source_table: "Source Table" # put the value of excel cell which represents the COLUMN 'source table' in sttm table
source_column: "Source Column (with Transformation)" # put the value of excel cell which represents the COLUMN source column in sttm table
expression: "Expression" # Not being used in current excel template
target_column: "Target Column" # put the value of excel cell which represents the COLUMN target column in sttm table
column_type: "Pk/Merge Columns/History Trigger" # put the value of excel cell which represents the COLUMN column_type (to mark column as primary, merge or history column) in sttm table
additional_columns_sheet_columns: # header column mapping of 'additional columns' sheet
column_id: "Key"
name: "Name"
value: "Value"
framework_column: "Framework Column"
framework_generated: "Framework Generated"
truncate_partition_columns_sheet_columns: # header column mapping of 'truncate partition columns' sheet
column_id: "Key"
name: "Column_name"
value: "Value"
- The app will produce 1 job config yaml file for 1 excel sheet (It will exclude certain sheets like "Additioanl Column" "truncate partition column" sheet)
- Generated job config will be named after it's source excel sheet name.
- Job config file may contain invalid query, as it's just merging and replacing strings. Validating formed query is not a part of the app, as of now. Verify before using it in Guzzle.
- It's correctness depends on the use of accurate syntax in excel sheet.
- Range delimeter only works with integer ids (this are unlikely and hence can be ignored)
- Pre-SQL and Post-SQL has to be provided as multi row. this has to be tested.
- Any formatting /extract char which excel may introduce may have impact to job yml generated
- While using clauses, use them in order. (eg.: In excel sheet, after sttm query, join clause row must appear in before order by clause row)
- Use exact syntax of corresponding technology
- If there are other columns, besides specified in sttm_columns conf, will be ignored. [in STTM table, in sttm sheet]
- Data above the 'header_row_index' in 'additional column sheet' and 'truncate partition column' sheet, will be ignored.
- Continuous 10 blank cells in first column in any sheet will be considered as the end of sheet (when 'header_row_index' is specified, it will count it after 'header_row_index').
- This app is tightly coupled with config file and the excel template. So your config and template has to be hand in hand.