How to figure out why new content is not showing up - mitikov/KeepSitecoreSimple GitHub Wiki
Despite new content was successfully published to the live site, old data is still shown.
Although new data lives in the database, old one still owns the show - must be cached somewhere around! The article has a few major parts:
- Describe caching layers
- Check which layer stores obsolete content
- Collect sufficient data set to track the cause
- Investigating the collected data
Looking at a simplified schema, you should start understanding how funny it is to investigate the causes of some content missing checking Sitecore logs only:
Browser caches whatever it is asked to (f.e. headers like Cache-Control) - request any enterprise-scale site from Fiddler composer for a good example:
Being a part of HTTP protocol definition, headers could be understood and cached by any network device (f.e. proxy).
Content delivery network (aka CDN) typically acts as a caching layer for media (videos, images)...
No matter how fast your server hard drive is, it still has limited speed; CDN logically is an additional drive with almost infinite speed that acts as cache middleware:
Since CDN knows nothing about content change, it would keep on serving old content unless explicitly told to refresh cache (through CDN API - call 'clear' handler).
Hypertext Transfer Protocol Stack (aka HTTP.sys driver) provides kernel-level caching:
In human readable language - request may be processed without even reaching your application!
Although the approach improves throughput rate for requests, chances are it might show obsolete content.
IIS User-mode caching is designed to fulfill Kernel-level cache limitations and can be easily configured via UI:
And would result in configuration generated:
<caching>
<profiles>
<add extension=".keepSitecoreSimple" policy="CacheForTimePeriod" kernelCachePolicy="DontCache" duration="00:00:30" varyByHeaders="x-header" varyByQueryString="publish" />
</profiles>
</caching>
ASP.NET allows re-using produced markup noticeably reducing the performance cost of request processing.
Output caching is vital for highly-loaded applications - exchanges expensive CPU for relatively-cheap RAM.
Needless to say pure ASP.NET caching engine knows nothing about Sitecore publishing mechanisms, thus would not refresh itself automatically.
Sitecore analogue of ASP.NET Output HTML caching that gets cleared whenever publish:end:remote
event is processed by CD server for sites defined in HtmlCacheClearer configuration:
<handler type="Sitecore.Publishing.HtmlCacheClearer, Sitecore.Kernel" method="ClearCache">
<sites hint="list" >
<site>website</site>
<site>CustomSite</site>
</sites>
Whenever Sitecore HtmlCacheClearer runs, it leaves a message in logs:
INFO HtmlCacheClearer done.
Sitecore data units (items) are stored in a hierarchical format that can be inspected via the following techniques:
- Sitecore Query - gets expensive in terms of CPU as touches all application business logic
- Fast Query - translated into direct SQL query -> gets slower over time as volume of content grows + becomes a bottleneck in loaded solutions
Content Search API brings the following advantages acting as a data source:
- Fast navigation via content hierarchies thanks to each item document knows all parents
- Possibility to configure separate index per logical entity (like news) whitelisting only needed fields
The job to update index is triggered on publish:end
- at the same time when HTML cache is cleaned.
While it takes time for a crawling job to finish indexing until that time index has obsolete data.
Should an index-dependant control rely on Sitecore HTML caching, it must be configured to be cleared on index update to tackle time between indexing started and ended:
Sitecore caching layer is automatically scavenged whenever data change event is registered in Event Queue.
Those could be bypassed by requesting the page from server that hosts an application - opening browser in incognito mode on CD server.
Alternatively you can temporary disable Kernel/User caching layer:
<system.webServer>
<caching enabled="false" enableKernelCache="false" />
</system.webServer>
Should caching be the culprit, it would constantly bother you and would be identified during the development stage.
ASP.NET output caching will unlikely be a culprit as Sitecore does not support it - as old data always shown.
Nevertheless, you still could hunt for OutputCacheAttribute
usages in your project.
Sitecore caching layer can be cleaned via sitecore/admin/cache.aspx
page.
If manual cache scavenge helps, obsolete data was in cache.
By default, indexing queries are written into the Search log - one can locate query and replay it via search engine UI (Azure Search Portal/Solr Admin/Luke).
Alternatively, in Content Editor you can navigate to the newly published item in publishing target database (web), and trigger Re-Index Tree
from Developer
ribbon (hidden by default -> right click in empty ribbon space to bring it up).
Game plan:
- Configure CM to write whatever was published into the logs
- Configure CD to print event processing duration by Sitecore.config change:
<events timingLevel="high">
- Modify item -> write down ItemId along with old/new values
- Collect a full memory snapshot of Sitecore from CD server to act as a base line
- Prepare but (not start) dynamic code profiling on CD to record how remote events are being processed by CD
- Trigger publish from CM and start collection of performance profile on CD
- Stop dynamic profiling ~30 seconds after publishing is over
- Collect a full memory snapshot of Sitecore from CD server to be compared with 'before'
- Navigate to the 'faulting' page that shows 'obsolete' content and save its markup to HTML file (will use it to locate cached markup in memory snapshots)
- Make a screenshot of the page and highlight the area with 'old' content
- Collect the content of the publishing target 'EventQueue' table -> SELECT * FROM [EventQueue] ORDER BY [Stamp] DESC
- Grab Sitecore logs + configs generated/used during test from CM and CD
Open 'CM' logs and verify item publish event was recorded indeed. It happens item was not published due to some restrictions that are explained in message:
INFO ##Publish Item: Name=Sample Item, Uri=sitecore://master/{c13d6961-df4a-418a-a20c-60673b356207}?lang=nl-NL&ver=1, Operation=Updated, ChildAction=Allow, Explanation=Version 'nl-NL_1' was published.
Open EventQueue export and locate SavedItemRemoteEvent
that matches itemId - write down the stamp, we'll use it later:
Sitecore would keep last processed timestamp in memory.
We'll extract in-memory last processed timestamp and compare it with ItemId
known stamp:
Should in-memory stamp be smaller than our item Event = eventQueue processing lag = processing speed should be checked via PerfView profile
We'll check in-memory stamp is larger scenario = obsolete data left in some caching layer, main - question which one?
An old content (markup) is to be located by string match in memory snapshot:
!strings /m:oldValue
Followed by !gcroot
command we'll get an understanding which object owns the data:
In-memory application code can be re-generated from snapshot and cache cleanup algorithms questioned.
I'm giving a list of scenarios known to me to show my appreciation of you reading all the article:
-
EnableEventQueues
setting isfalse
in Sitecore configuration = remote events are not replayed = nobody clears in-memory cache. - Multiple servers with same
InstanceName
= remote event is marked as processed by instance = another one with the same name will skip it = old data remains in process memory. Leaving non-blankInstanceName
setting in Azure WebApps with enabled autoscaling = problem. -
Multiple worker processes (
Application pool
->Advanced Settings
->Maximum Worker Processes
) = previous, but trickier to detect. Should you have a single CD and phantom cache problems, I'd suggest double check this point. - Last processed timestamp in
Properties
table is set to a crazy high value, thereby causing new events not to be picked - before/after memory snapshot would indicate that; the solution is to remove keys with_lastProcessed
from Properties table & restart an application. - Custom sites not added to the
HtmlCacheClearer
list - just add it. - Index-dependent component does not have
clear on index update
set - thus HTML cache gets populated before index update job finishes - just add it. - Too many changes (> configured index rebuild threshold) published = full index rebuild + SwitchOnRebuild index used = long delay before indexing ends = expected delay. The solution either disable checkForThreshold on the used index or just wait. Dynamic code profile would show if any computed field consumes much time.
- Remote events are replayed slower than raised - publishing may take place using multiple threads on the dedicated box, while CD is constantly is under load and processes remote events in a single thread only -> inspecting last processes stamps in after/before memory snapshots would roughly show the number of events per minute processed. Dynamic code profiles would explain the nature of wallclock time consumption.
Caching.CacheKeyIndexingEnabled.*
settings were introduced to improve cache scavenge timings in exchange for additional RAM usage. - Custom caching layer used - you'll be able to locate obsolete data in memory snapshots +
gcroot
will highlight your object in the chain. - Exceptions in custom handlers preventing further event subscribers from doing their part - dynamic code profile records
Exception
events + exceptions are written to logs. - Handler clears cache if a certain field was changed, but never triggers despite field is known to be changed. Sitecore can whitelist field list included into the
SavedItemRemote
event to avoid dull data hammering Event Queue.