Guzzle Repository Schema Changes in 2.0 - ja-guzzle/guzzle_docs GitHub Wiki

Changes to Job Audit tables

Type Issue # Table Column Data Type Modification Purpose
Audit ingestion#184 job_info discard_records bigint New column Capturing the new column for discarded records.
Audit Changes in Sampling feature cluster_heartbeat cluster_id bigint New table Changes in design of Data sampling
Audit Changes in Sampling feature cluster_heartbeat heartbeat datetime New table Changes in design of Data sampling
Audit Changes in Sampling feature cluster_heartbeat status varchar(20) New table Changes in design of Data sampling
Audit Changes in Sampling feature data_sampling_job cluster_id bigint New column Changes in design of Data sampling
Audit Changes in Sampling feature data_sampling_job created_time datetime New column Changes in design of Data sampling
Audit Changes in Sampling feature data_sampling_job action varchar(20) New column Changes in design of Data sampling
Audit guzzle_common#309 job_info message varchar(8000) Modification Long error messages in job_info
Audit guzzle_common#309 job_info_ext message varchar(8000) Modification Long error messages in job_info
Audit guzzle_common#461 job_info_param parameter_value text Modification Store the dependency graph in this table
Audit guzzle_common#461 job_info executor_id varchar(200) Modification Store the dependency graph in this table
Audit guzzle_common#328 job_info_ext module varchar(200) Modification Made part of PK (and hence non null). ADF sync utility and ADF extenral jobs will bring pipeline level stats and the module name is included to distinguish both this
Audit guzzle_common#328 job_info_param_ext module varchar(200) New column Made part of PK (and hence non null). ADF sync utility and ADF extenral jobs will bring pipeline level stats and the module name is included to distinguish both this
API guzzle_common#429 recent_files user_id bigint New table Home page functionality, storing recently modified configs
API guzzle_common#429 recent_files config_type varchar(20) New table Home page functionality, storing recently modified configs
API guzzle_common#429 recent_files config_name varchar(200) New table Home page functionality, storing recently modified configs
API guzzle_common#429 recent_files last_access_date datetime New table Home page functionality, storing recently modified configs
API guzzle_common#333 users user_type varchar(255) New column External user feature
API Changes in Sampling feature user_default_spark_env user_id bigint New table New table added to track the last sampling env
API Changes in Sampling feature user_default_spark_env spark_env_name varchar(100) New table New table added to track the last sampling env
API Changes in Sampling feature user_default_spark_env created_time datetime New table New table added to track the last sampling env
API Changes in Sampling feature user_default_spark_env updated_time datetime New table New table added to track the last sampling env

Upgrade path for existing repo

  1. For API, the changes will be automatically done. They will not impact 1.0 env which points to this.
  2. Run ALTER commands for Repository schema (no impact to existing columns expand they are being expanded)

Known issues

  1. Batch records will now show up in the new UI for the runs done using 1.0 as 1.0 did not have record in the job_info table for the batch. guzzle_common#378. If this records are to be visible in new UI then we have to create them in JOB_INFO with tag as batch (not recommended)

Alter commands for SQL Server

CREATE TABLE [dbo].[cluster_heartbeat](
	[cluster_id] [bigint] NOT NULL,
	[heartbeat] [datetime] NULL,
	[status] [varchar](20) NULL,
PRIMARY KEY CLUSTERED 
(
	[cluster_id] ASC
))
;

ALTER TABLE [dbo].[data_sampling_job] add 
	[cluster_id] [bigint] NOT NULL,
	[created_time] [datetime] NULL,
	[action] [varchar](20) NOT NULL
;


ALTER TABLE [dbo].[job_info] alter column
[message] [varchar](8000) ;

ALTER TABLE [dbo].[job_info] add
[executor_id] [varchar](200) NULL;

ALTER TABLE [dbo].[job_info] add
discard_records bigint;

ALTER TABLE [dbo].constraint_check_summary add
total_count bigint;

	
	

ALTER TABLE [dbo].[job_info_ext] alter column
[message] [varchar](8000) ;


ALTER TABLE [dbo].[job_info_ext] alter column
[module] [varchar](200) not null;


ALTER TABLE [dbo].[job_info_param] alter column
[parameter_value] [text]
;


ALTER TABLE [dbo].[job_info_param_ext] ADD
[module] [varchar](200) NOT NULL;


DECLARE @SQL VARCHAR(4000)
SET @SQL = 'ALTER TABLE job_info_param_ext DROP CONSTRAINT |ConstraintName| '

SET @SQL = REPLACE(@SQL, '|ConstraintName|', ( SELECT   name
                                               FROM     sysobjects
                                               WHERE    xtype = 'PK'
                                                        AND parent_obj = OBJECT_ID('[job_info_param_ext]')
                                             ))

EXEC (@SQL)



ALTER TABLE [dbo].[job_info_param_ext] ADD CONSTRAINT pk_job_info_param_ext
PRIMARY KEY CLUSTERED 
(
	[job_instance_id] ASC,
	[module] ASC,
	[parameter_name] ASC
)
;


SET @SQL = 'ALTER TABLE job_info_ext DROP CONSTRAINT |ConstraintName| '

SET @SQL = REPLACE(@SQL, '|ConstraintName|', ( SELECT   name
                                               FROM     sysobjects
                                               WHERE    xtype = 'PK'
                                                        AND parent_obj = OBJECT_ID('[job_info_ext]')
                                             ))

EXEC (@SQL)


ALTER TABLE [dbo].[job_info_ext] add CONSTRAINT pk_job_info_ext
PRIMARY KEY 
(
	[job_instance_id] ASC,
	[module] ASC
)
;
⚠️ **GitHub.com Fallback** ⚠️