Dispatcher - bartoszWesolowski/aem-tips GitHub Wiki

Dispatcher

  • Adobe Experience Manager's caching and/or load balancing tool

Caching methods

Content updates

  • deletes modified files from cache
  • deletes all files matching the file path (if .html was invalidated then .json will be too)
  • touches statfile - updates the timestamp to indicate date of last change
  • invalidated files are removed but not replaced immediately

Auto-invalidation

  • invalidates parts of cache without deleting any files - at content update a statfile timestamp is changed to reflect last content update
  • when request for a file is made Dispatcher checks whether the file is newer then statfile and depending on outcome serves static file or fetches the content from AEM

Determining whether a document is subject to caching

By default Dispatcher will call AEM in case:

  • If the request URI contains a question mark "?". This usually indicates a dynamic page, such as a search result, which does not need to be cached.
  • The file extension is missing. The web server needs the extension to determine the document type (the MIME-type).
  • The authentication header is set (this can be configured)

Sticky connections

  • ensures that documents for one users all composed on same AEM instance
  • it's important for personalized pages and session data (logged in user)
  • when using sticky connection reconsider how caching should be implemented to avoid caching user related data

Dispatcher in front of author

  • used to improve authoring performance

Installing Dispatcher for Apache

In Apache httpd.conf:

  • add LoadModule dispatcher_module modules/dispatcher-apache<...>.so
  • configure dispatcher configuration like DispatcherConfig, DispatcherLog and DispatcherLogLevel
<IfModule disp_apache2.c>
    DispatcherConfig conf/dispatcher.any
    DispatcherLog logs/dispatcher.log 
    DispatcherLogLevel 3 # 3 - debug, 0 - error
    DispatcherDeclineRoot 0
    DispatcherUseProcessedURL 0 # 0 user original url, 1 - uses url processed by handlers triggered before dispatcher (rewrites)
    DispatcherPassError 0 # 0 - errors handled by AEM, 1 - error handled by Apache
    DispatcherKeepAliveTimeout 60
</IfModule>
  • Set handler:
<VirtualHost 123.45.67.89>  
    ServerName www.mycompany.com  
    DocumentRoot /usr/apachecache/docs  
    <Directory /usr/apachecache/docs>  
        <IfModule disp\_apache2.c>  
            SetHandler dispatcher-handler  
        </IfModule>  
        AllowOverride None  
    </Directory>  
</VirtualHost>  

Dispatcher configuration

  • by default stored in dispatcher.any file

Defining farms

  • define how dispatcher should handle specific website and URL
  • single farms - all requests handled in same way
  • /farm property is multi valued

Including config files:

/farms
  {
  $include "myFarm.any" # or "farm_*.any" to include all files matching
  }

Environment variables can be used:

/renders {
  /0001 {
    /hostname "${PUBLISH_IP}"
    /port "8443"
  }
}

Client Headers

  • /clientheaders
  • defines which headers will be passed from client to AEM
  • list must contain all headers that will be passed (if customization needed)
/clientheaders
  {
  "CSRF-Token"
  "X-Forwarded-Proto"
  "referer"
   ...

  }

Virtualhosts

  • list of all hostname/URI compintations that Dispatcher accepts for this AEM instance (farm)
  • * as wildcard
  • [scheme]host[uri][*] format
  /virtualhosts
    {
    "www.myCompany.com"
    "www.mySubDivison.*"
    }

All requests:

   /virtualhosts
    {
    "*"
    }

Matching a virtualhost

  • starts from lowest farm and goes up
  • starts with top virtual host and goes down
  • First virtual host that matches scheme, host and uri is used
  • If non found then first match host is used
  • if non found then topmost virtualhost in topmost farm is used (default one should be first vhost in first farm)

Enabling Secure Sessions - /sessionmanagement

  • under /famrs property
  • in /cache the /allowAuthorized must be stet to 0
  • used along with CUGs - to make pages login protected
/sessionmanagement 
  { 
  /directory "/usr/local/apache/.sessions" # required - directory where sessions are stored
  /encode "md5" #(default to md5, can be hex)
  /header "HTTP:authorization" # (optional), header (HTTP: prefix), or cookie (COOKIE: prefix) that defines where authorization info is stored
  /timeout "800" #(optional number of seconds that will cause the session to expire after not being used)
  }

Page renderers

  • /renders property defines the AEM instance that will be used to render actual content Options:
  • /receiveTimeout - number of milliseconds that the response can take, default value is 10 minutes, 504 if reached while parsing response headers, incomplete HTML will be removed in case it was reached while the response body is read (cache will be deleted)
  • /secure - if set to "1" then use HTTPS to communicate with AEM
/renders
  {
    /myRenderer
      {
      # hostname or IP of the renderer, "127.0.0.1" if aem is running on same machine
      /hostname "aem.myCompany.com"
      # port of the renderer
      /port "4503"
      # connection timeout in milliseconds, "0" (default) waits indefinitely
      /timeout "0"
      }
  }

Providing multiple AEM publish instances

  • requests will be distributed equally between both machines
/renders
  {
    /myFirstRenderer
      {
      /hostname "aem.myCompany.com"
      /port "4503"
      }
    /mySecondRenderer
      {
      /hostname "127.0.0.1"
      /port "4503"
      }
  }

Filter

  • for configuring access to content
  • determines which requests are accepted by apache
  • all that does not match returns 404
  • if no filter defined then all requests are accepted
  • best to use with whitelist strategy -> deny everything, allow what's needed

Filter rule consist of:

  • type: /allow or /deny
  • element of the request: /method, /url, /query, /protocol, /path, /selectors, /extension, /suffix and special /glob which match the whole request line
  • rules are matched for a request line that is Method Request-URI HTTP-Version <CRLF>, for example : GET /content/geometrixx-outdoors/en.html HTTP.1.1<CRLF>
  • when creating filter rules use "" for simple patterns, if a pattern is a regular expression the use single quotes
  • Trace logging can be used to debug filtering
/filter {
    /0001  { /glob "*" /type "deny" }
    /0002  { /type "allow" /method "POST" /url "/content/[.]*.form.html" }
    /0003  { /type "deny"  /url "/publish/libs/cq/workflow/content/console/archive*"  }
    /0004  { /type "allow"  /url "/libs/cq/workflow/content/console/archive*"   }
    /005  {  /type "allow" /extension '(css|gif|ico|js|png|swf|jpe?g)' }
    /0081
      {
      /type "deny"
      /selectors '((sys|doc)view|query|[0-9-]+)'
      /extension '(json|xml)'
      }
}

Query strings

  • filter can be used to restrict which query strings are allowed
  • if a rule contains query then it will only matches if query passes the query pattern
/filter {
 /0001 { /type "deny" /method "POST" /url "/etc/*" }
 /0002 { /type "allow" /method "GET" /url "/etc/*" /query "a=*" } # request will be accepted only if it contains the query 
}

Vanity urls

  • under farm section
  • when enabled dispatcher periodically call AEM to get the list of vanity urls
  • if page is denied by /filter config then dispatcher check the vanity url list and if the denied url is on the list access is allowed
  • AEM requires additional package to support vanity urls
 /vanity_urls {
      /url "/libs/granite/dispatcher/content/vanityUrls.html" # path to vanity servlet
      /file "/tmp/vanity_urls" # file to the path where vanity urls are stored
      /delay 300 # seconds between calls to the aem servlet to update urls
 }

Resolving filters

  • if multiple patterns apply for a request then the last one is effective

Configuring cache

  • /docroot - file where files are cached, must be same as document root of web server
  • /serveStaleOnError - if set to 1 dispatcher will not delete invalidated content unless render server returns successful response - if AEM responds with error then the outdated html file is served with HTTP status 111
  • `/allowAuthorized - if set to "1" requests containing authorization headers (authorization) or cookies (authorization or login-token) can be cached. This config prevents from servinc cached documents to users who do not have required rights
/cache
  {
  /docroot "/opt/dispatcher/cache"
  /statfile  "/tmp/dispatcher-website.stat"          
  /allowAuthorized "0"
      
  /rules
    {
    # List of files that are cached
    }

  /invalidate
    {
    # List of files that are auto-invalidated
    }
  }
  

Dispatcher will never cache when

  • request uri contains ? - dynamic pages
  • file extension is missing - extension needed to get the mime type
  • authentication header is set (can be configured)
  • if AEM responds with no-cache, no-store, must-revalidate headers

Note

  • when you have 2 dispatchers make sure that request always go only through one dispatcher - dispatcher does not handle requests that come from another dispatcher

Documentation

⚠️ **GitHub.com Fallback** ⚠️