cde_val - Hari-dta/hari GitHub Wiki

write a pyspark script

the source is gonna be like a csv

Column A,Combined Values
source_system,"Derived: 'CL' (constant), ifrs9.committed_unutilized_v.LCPF_LOAD_COUNTRY (hardcoded as 'Unused Limits'), ifrs9.committed_unutilized_v.source_system, ifrs9.deals_funded_unfunded_report.deal_type"
facility_id,"ifrs9.committed_unutilized_v.LCPF_CMTMNT_REF, ifrs9.deals_funded_unfunded_report.facility_ref, ifrs9.deals_funded_unfunded_report_all_lg.facility_ref"
customer_id,"ifrs9.committed_unutilized_v.LCPF_CSTMR_MNMNC, ifrs9.deals_funded_unfunded_report.customer_equation_id_new, ifrs9.deals_funded_unfunded_report.customer_equation_id_new, ifrs9.deals_funded_unfunded_report.customer_id_new, ifrs9.deals_funded_unfunded_report_all_lg.customer_id or UNFUN_CUST.customer_equation_id"
branch,"RAW_DATA_VAULT.BRANCHES.CAPF_BRANCH_NUMBER, RAW_DATA_VAULT.BRANCHES.CAPF_BRNCH_NUMBER, ifrs9.deals_funded_unfunded_report.account_branch_number_new, ifrs9.deals_funded_unfunded_report_all_lg.account_branch_number"
basic,"ifrs9.committed_unutilized_v.LCPF_CSTMR_MNMNC, ifrs9.deals_funded_unfunded_report.external_account_number_new, ifrs9.deals_funded_unfunded_report_all_lg.external_account_number"
suffix,"ifrs9.committed_unutilized_v.SUFFIX, ifrs9.deals_funded_unfunded_report.account_suffix_new, ifrs9.deals_funded_unfunded_report_all_lg.account_suffix"

where the pyspark script has to read this and take the column combined values which is in the format schema.tablename.column name 

later it should check if that column exists in that table of that respective schema 
p.s shell can j=have multiple hivecolumns so make sure all are read
⚠️ **GitHub.com Fallback** ⚠️