Troubleshooting Guzzle 2.0 Envrionment - ja-guzzle/guzzle_docs GitHub Wiki
- Guzzle Home
- Verifying GUZZE_HOME
- Un/mount Mount GUZZE_HOME manually
- To do forceful unmount of guzzle_home
- To run this as root (or as sudo usre , which is VM admin)
- Databricks Workspace setup
- Incorrect Guzzle home
- Unreachable Guzzle repository database
- Author & Admin page don't show up in the UI
- Root cause
- Resolve
- Fail to fetch runtime audit after guzzle upgrade
- Root cause
- Resolve
- Set up shared storage
- Call Guzzle from ADF
- Guideline
- config nginx
- Stage pairs
- Meaning: Stage pair are meant to ensure that at any cost the next day FND wont be allowed to run till prevoius days PLP has completed
- There are two concepts:
- Guzzle default beavior is this:
- scd2 test
- create table in delta
- insert data into table
- perform scd2
- job heart beat explanation
- job heart beat
- the reason heart beats being lost could be due to following
- Monitoring UI issues
- Schema is not upto date
Note : All the steps below are to be done using guzzle account
- how to check if guzzle home is mounted on shared storage or not. Two ways to ensure it (anyone of them are fine)
- Check disk behind /mount/guzzle
Expected output:df -k /mount/guzzle
guzzle@guzzletrial309:/opt/guzzlescript$ df -k /mount/guzzle Filesystem 1K-blocks Used Available Use% Mounted on blobfuse 30309264 7926976 22365904 27% /mount/guzzle
- Check if blobsuse process is running
Expected output:ps -ef | grep "blobfuse \/mount\/guzzle"
guzzle@guzzletrial309:/opt/guzzlescript$ ps -ef | grep "blobfuse \/mount\/guzzle" guzzle 23357 1 1 03:11 ? 00:00:43 blobfuse /mount/guzzle --tmp-path=/mount/blobfusetmp --container-name=guzzletrial309gh -o attr_timeotry_timeout=240 -o negative_timeout=120 --file-cache-timeout-in-seconds=10
- Additionally check if you see all the files in /mount/guzzle
guzzle@guzzletrial309:/opt/guzzlescript$ ls -lthr /mount/guzzle total 16K drwxrwx--- 2 guzzle guzzle 4.0K Sep 28 05:06 hive-connectors drwxrwx--- 2 guzzle guzzle 4.0K Sep 28 05:06 scripts drwxrwx--- 2 guzzle guzzle 4.0K Sep 28 05:06 test-data drwxrwx--- 2 guzzle guzzle 4.0K Oct 22 03:36 logs drwxrwx--- 2 guzzle guzzle 4.0K Oct 22 03:36 libs drwxrwx--- 2 guzzle guzzle 4.0K Oct 22 03:36 bin drwxrwx--- 5 guzzle guzzle 4.0K Oct 22 03:55 conf
- Unmount Guzzle home manually
- Ensure you don't have any terminal/shell session which has current working directory as /mount/guzzle or any subdirectory in it else unmount will fail
- Do the unmount using. One unmounted run steps above to verify the guzzle home is mount. It should state its unmounted
fusermount -u /mount/guzzle
- To mount Guzzle home manually and see specific errors run below:
export AZURE_STORAGE_ACCOUNT=<storage account name> export AZURE_STORAGE_ACCESS_KEY="<storage account key>" blobfuse /mount/guzzle --tmp-path=/mount/blobfusetmp --container-name=<containername> -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120 --file-cache-timeout-in-seconds=10
# To run this as root (or as sudo usre , which is VM admin)
umount -l /mount/guzzle
Then stop and restart api
-
Databricks for Job cluster relies on "DBFS Guzzle Directory" setting in Compute:
-
For Analytics cluster it relies on two settings:
As well Env variable in the cluster: Notice that there is /dbfs/ prefix as this GUZZLE_HOME is specified as unix directory and not DBFS directory
Note: Do take note that you can't share same Analytics cluster across two guzzle environment for following reason:
- The guzzle_home for both env WILL be different and since the Env Variable is setup once at time of Analytics cluster creation time this can't be changed
- The libraries are installed once and they are not removed after job run finishes. If two guzzle env belong to diff version may cause issue
- Databricks jobs will not run correctly in Guzzle env is pointing to wrong repository database
Missing directory /mount/blobfusetmp
- Create missing directory
- Unmount blob
- Mount blob manually
- Run startup script
Change in guzzle runtime audit database table schema
Run this script on runtime audit database:
ALTER TABLE [dbo].[job_info] add
discard_records bigint;
ALTER TABLE [dbo].constraint_check_summary add
total_count bigint;
have to put a forward slash here, otherwise it will return "unexpected error" GUZZLE_HOME=/mount/guzzle
https://justanalytics.github.io/guzzle-docs/docs/next/parameter_datafactory
create a fresh user in Admin -. Secuity-Users
root@guzzletrial309:~# vi /etc/nginx/sites-available/default
server {
#listen 80;
listen 443 ssl http2;
listen [::]:443 ssl http2;
root /guzzle/web;
ssl_certificate /home/guzzle/certs/localhost.crt;
ssl_certificate_key /home/guzzle/certs/localhost.key;
# Add index.php to the list if you are using PHP
index index.html index.htm index.nginx-debian.html;
server_name _;
location / {
try_files $uri /index.html;
}
location /api/ {
proxy_pass http://localhost:9090;
proxy_set_header Expect $http_expect;
}
location /atlas/ {
proxy_pass http://localhost:21001;
}
}
sudo systemctl restart nginx
Meaning: Stage pair are meant to ensure that at any cost the next day FND wont be allowed to run till prevoius days PLP has completed
When you run stage: CBS: STG, FND, PLP and assume you have 3 open batches , it will try to run D1 : STG, FND, PLP, then D2 : STG, FND, PLP, and D3: STG, FND, PLP,
Now if it D2 FND fails - it wont proceed to D2 : PLP and alos D3 : STG, PLP and FND.
But usually when people create indepnet stages they usually also run the indepdnet as :
Submit Run Stages for : STG, as separate api call then FND and then PLP
in that case : All three days STG will run then all three days FND will be attempt and Since : D2 FND has failed it won't move to D3. (whih); But when you run PLP - iand assume if there is no issue /error in the job in PLP job, it will simply run all three day sof PLP
If you want to strictly ensure that by no means D3 FND should run till D2 PLP is clear then define the pairing. Else if someone runs stages STG as seprately and FND as sepratley and PLP as separately then it wll blindly run FND for all the three days
Its always runs batches in chronological order of date Also it runs a stage if the previous stage is successful for that day
But if you want to also buil dependecy of the next stage( like PLP should succeed before i run next day fnd) then you use stage pair
If you run stage STG-FND-PLP together - then it wll first take up FND for D2 and then move to PLP for D2 and then D3 if you run stage STG alone then it will run D3 STG If you run STG-FND, it will run FND for D2 and then move to STG for D3 If you run FND, it will run FND for D2 and then move to FND for D3 If you run PLP, it will run PLP for D2 and since nothing to do so it will exit doing nothing
You can't configure it at job group or stage leve. When you run jobs via stage wihch we all do, all the job groups in that stage will be running wiht resume flag. And hence if Group1 finished, and group 2 failed, it will rerun Group2 from that very failed jobs (don't mixup wiht Rerun - which is allowing runing batches for histroical period while current period have alrady ran success)
resume stage is applied to stage (stages ) that you are running fom API. if you are running one partiuclar stage, it wil try to resume that particular stage - for ohter stages (assume you runing them separately) you can pass resume stage = N and it will run that particular failed stage from begining
use default;
create table tgt_scd2 (id int, c1 string, c2 string, eff_start_dt timestamp, eff_end_dt timestamp, curr_rec string, seq_key bigint);
--truncate table tgt_scd2;
--insert into tgt_scd2 select -1,null,null,current_timestamp, null,'Y',0;
select * from tgt_scd2
version: 1
job:
type: processing
source:
endpoint: hive
incremental: false
properties:
sql: |-
select 1 id, 'a' c1, 'b' c2 union all
select 2 id, 'a' c1, 'b' c2 union all
select 3 id, 'a' c1, 'b' c2 union all
select 4 id, 'a' c1, 'b' c2
additional_columns:
- name: eff_start_dt
framework_column: w_eff_start_date_ts
framework_generated: true
- name: eff_end_dt
framework_column: w_eff_end_date_ts
framework_generated: true
- name: curr_rec
framework_column: w_current_record_flag
framework_generated: true
- name: seq_key
framework_column: w_sequence_key
framework_generated: true
target:
primary_key_columns:
- id
operation: effective_date_merge
soft_delete: false
properties:
template: default
table: tgt_scd2
history_columns:
- c1
- c2
endpoint: hive
version: 1
job:
type: processing
source:
endpoint: hive
incremental: false
properties:
sql: |-
select 1 id, 'a' c1, 'b' c2 union all
select 2 id, 'a1' c1, 'b' c2 union all
select 3 id, 'a' c1, 'b' c2 union all
select 4 id, 'a1' c1, 'b' c2
additional_columns:
- name: eff_start_dt
framework_column: w_eff_start_date_ts
framework_generated: true
- name: eff_end_dt
framework_column: w_eff_end_date_ts
framework_generated: true
- name: curr_rec
framework_column: w_current_record_flag
framework_generated: true
- name: seq_key
framework_column: w_sequence_key
framework_generated: true
target:
primary_key_columns:
- id
operation: effective_date_merge
soft_delete: false
properties:
template: default
table: tgt_scd2
history_columns:
- c1
- c2
endpoint: hive
As part of the job we have a light weight thread which sends heart beat for the API/UI every 5 second. The api compares the last heart beat of running job with system time and if its more than 30 minute sold it marks the job as Aborted however the underlying job in databricks is not cancelled and may be running
i. Increase workload on the Driver (the GC thread taking up resources for long period)
ii. The database unavailability
However since the jobs is running and if it completes successfully, the status is changed back to SUCCESS (from ABORTED) The underlying job group will not close it self and will show status RUNNING and will not proceed to the next job till the databricks job itself has finished (as Guzzle keeps checking status using Databricks URL) Since the job reran took approx. 12 minutes today at 11am, you will further monitor this.
On a separate note you can upgrade to 0.7.38 in UAT and then prod. As that’s the latest and has minor update to supports the resume of batches from failed job and not from beginning if guzzle.stage.resume = Y
- Example discard_resocrds was missing and leads to excpetion
- There was oudated jar in /guzzle/api/libs named external.jar which conflicted with other jars and got method not found error
- When the Guzzle is upgraded and web app is still stale version this can happen - example scenario below
2020-10-30 15:17:07.853 INFO 3477 --- [http-nio-9090-exec-10] c.j.g.a.s.a.AbstractJdbcJobAuditService : search jobs query : SELECT TOP 1000 job_info_records.*, DATEDIFF(SECOND, start_time, end_time) AS duration FROM ( SELECT DISTINCT(activity_job_instance_id) as job_instance_id FROM ( SELECT activity.job_instance_id as activity_job_instance_id, activity.name as activity_name, activity.module as activity_module, activity.tag as activity_tag, activity.parent_job_instance_id as activity_parent_job_instance_id, activity.status as status, activity.business_date as business_date, activity.start_time as start_time, activity.end_time as end_time, activity.message as message FROM ( SELECT * FROM job_info WHERE tag='pull' ) activity ) job_detail ) unique_job_instance_ids LEFT JOIN job_info job_info_records ON unique_job_instance_ids.job_instance_id=job_info_records.job_instance_id ORDER BY jobId>jobInstanceId desc
2020-10-30 15:17:07.855 ERROR 3477 --- [http-nio-9090-exec-10] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.reflect.UndeclaredThrowableException] with root cause
com.microsoft.sqlserver.jdbc.SQLServerException: Incorrect syntax near '>'.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1624)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:868)
at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:768)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7194)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2979)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:248)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:223)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeQuery(SQLServerStatement.java:693)
at com.zaxxer.hikari.pool.ProxyStatement.executeQuery(ProxyStatement.java:111)
at com.zaxxer.hikari.pool.HikariProxyStatement.executeQuery(HikariProxyStatement.java)
at com.justanalytics.guzzle.api.service.audit.AbstractJdbcJobAuditService$$anonfun$searchJobInfo$6.apply(AbstractJdbcJobAuditService.scala:234)
at com.justanalytics.guzzle.api.service.audit.AbstractJdbcJobAuditService$$anonfun$searchJobInfo$6.apply(AbstractJdbcJobAuditService.scala:232)
at com.justanalytics.guzzle.common.exception.AutoCloseableResource$.withResources(AutoCloseableResource.scala:12)
at com.justanalytics.guzzle.api.service.audit.AbstractJdbcJobAuditService.searchJobInfo(AbstractJdbcJobAuditService.scala:232)
at com.justanalytics.guzzle.api.web.rest.BatchController.searchJobInfo(BatchController.scala:102)
at com.justanalytics.guzzle.api.web.rest.BatchController$$FastClassBySpringCGLIB$$116252d0.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:669)
at com.justanalytics.guzzle.api.web.rest.BatchController$$EnhancerBySpringCGLIB$$8eddb656.searchJobInfo(<generated>)
at sun.reflect.GeneratedMethodAccessor313.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:317)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:127)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:91)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:137)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:170)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:63)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at com.justanalytics.guzzle.api.security.jwt.BasicAuthenticationFilter.doFilter(BasicAuthenticationFilter.scala:38)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at com.justanalytics.guzzle.api.security.jwt.JWTAuthenticationFilter.doFilter(JWTAuthenticationFilter.scala:24)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at com.justanalytics.guzzle.api.security.jwt.AuthenticationFilter.doFilter(AuthenticationFilter.scala:37)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at com.justanalytics.guzzle.api.filter.ApplicationStatusFilter.doFilter(ApplicationStatusFilter.scala:30)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:116)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:64)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:56)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:331)
at org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:214)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:177)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:347)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:263)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:108)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:81)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:478)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:868)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)