Search Engine Safe and URL Improvements - Mach-II/Mach-II-Framework GitHub Wiki
By Kurt Wiersma (kurt@…)
- Overview
BuildURL
Enhancements- SES Routes Overview
- Configuring Routes
- Apache Rewrite Rules
- Tips on Building Good URL Aliases
- Additional Comments and Considerations
-
Comments
- Comment by anonymous on Thu 19 Feb 2009 05:17:18 AM EST
- Comment by peterfarrell on Thu 19 Feb 2009 12:11:19 PM EST
- Comment by anonymous on Wed 25 Feb 2009 05:12:09 AM EST
- Comment by kurtwiersma on Thu 26 Feb 2009 01:09:05 PM EST
- Comment by anonymous on Thu 26 Feb 2009 02:06:37 PM EST
- Comment by anonymous on Fri 27 Feb 2009 12:28:17 PM EST
- Comment by anonymous on Fri 27 Feb 2009 12:40:41 PM EST
- Comment by peterfarrell on Fri 27 Feb 2009 12:49:24 PM EST
- Comment by peterfarrell on Fri 27 Feb 2009 01:00:14 PM EST
- Comment by anonymous on Fri 27 Feb 2009 01:39:06 PM EST
- Comment by peterfarrell on Tue 10 Mar 2009 10:06:38 PM UTC
This is a working document and is subject to change.
This document has not been completed.
This M2SFP feature has been filed under ticket (trac-wiki) #32, #237, #238 and #241.
Several members of the Mach II community have requested enhancements to the buildURL functionality that was introduced in Mach II 1.5 as part of our support for search engine friendly URLs. In addition to these enhancements the Team Mach II also proposes adding a new feature to make special search engine friendly URL schemes known as "routes".
Based on feedback from the community, we are adding additional functionality to the way that buildUrl()
works:
- Provide a new method called
buildCurrentUrl()
that allows developers to build the current URL (in the browser) for the current request. - Provide an interface into the current URL to change or add new values to the the current URL.
- Order all incoming parameters for a URL in alphabetical order. All URLs with the same parameters will have the same ordering.
We've added an optional property called urlExcludeEventParameter
that will remove the event parameter (such as the event
in this example URL /index.cfm/event/about/
) when using SES URLs in Mach-II (this has not affect for URLs that use query string parameters). This makes the event value "positional" within the URL. Turning this setting on would change the standard SES URL from /index.cfm/event/about/
to /index.cfm/about/
. You can turn on this feature by adding the following property to your XML configuration file:
<property name="urlExcludeEventParameter" value="true"/>
The goal of the routes feature is to provide an easy way to specify which URL elements should be mapped to which event args when the request is processed. This way the key names of the URL parameters will not be included in route URLs. All parameter positions will be relative (positional) instead of name / value pairs when creating URLs with buildUrl()
. These positional locations for the URL parameters are then mapped to name / value pairs. This will make route style URLs more search engine and user friendly which ultimately helps make cleaner URLs. All routes will be developer defined and customizable.
For example if we wanted to build a URL that shows a product using buildUrl()
, the corresponding code would be something like this:
<a href="#BuildUrl('showProduct', 'productId=167809')#">See Product 167809</a>
The corresponding output of the source code would look like this (depending on your setting):
<a href="/event/showProduct/productId/167809/">See Product 167809</a>
However, we could setup a route that uses positional locations for URL parameters. Remember that since we have to define a route, we can specify which position each URL parameter will take. This is how we might build a route to show our fictitious product it would look like this:
<a href="#BuildRouteUrl('showProduct', 'productId=167809')#">See Product 167809</a>
The corresponding output of the source code would look like this (depending on how you configured the route):
<a href="/p/167809/">See Product 167809</a>
A route can have required and optional argument parameters when the route URL is built. Required arguments must be passed into buildRouteUrl()
while optional arguments do not (which will default to the default values when defining the route configuration). More on how these are used below. Also, buildRouteUrl()
supports adding query string parameters to a route if you want to define additional information on the route that should not be part of the concrete definition.
Example Syntax:
buildRouteUrl("routeName", "urlParameters", "queryStringParameters", "urlBase")
Argument Name | Required | Default | Description |
---|---|---|---|
routeName | required | n/a | The name of the route to build. |
urlParameters | optional | n/a | Name/value pairs (urlArg1=value1 |
queryStringParameters | optional | n/a | Name/value pairs (urlArg1=value1 |
All arguments (required or optional) can be defined a default value (displayType:normal
) using the Mach-II expression language. However, being a required argument for a route does not mean it has to be passed into buildRouteUrl()
; it means that this argument will be in the build route URL string. Required arguments merely reflect that the arguments must be in all built route strings. The same rule applies for optional arguments.
Routes are always built in the order of route name (or url alias if defined), required arguments in the order they are defined (using default if not passed to buildRouteUrl()
) and any optional arguments. If an optional argument is not passed to buildRouteUrl()
, it will not be used in the final route string since it is optional. Any optional arguments not in the route URL will have their default value from the route configuration. All optional arguments will be placed in the event object when Mach-II parses the route whether or not they are actually in the incoming request URL string.
If you call buildRouteUrl("product", "productId=123456|totalPerPage=20")
then it would result in a url like index.cfm/p/123456/normal/20/
. Because the displayType
argument was not provided when building the route, the default is used. Routes will always have the required arguments in them. If you provide an optional argument and that argument is the third optional argument, the first and second optional arguments will be used. This is because routes must have all arguments in a predictable position and therefore we cannot skip the displayType
value because it would change the order of arguments. In this case if we skipped it, it would indicate that the displayType
would be 10
would be incorrect because the arguments become "out of position". Since the displayType
was not provided when building the route, we use the default value of normal
. However, if we call BuildRouteUrl("product", "productId=123456|displayType=fancy")
the result would be index.cfm/p/123456/fancy/
. This is because when Mach-II receives the request on this route, we add in the default of 10
for the totalPerPage
argument because it was not defined and therefore announce the showProduct
event with the arguments of displayType
with the value fancy
and the argument of totalPerPage
with the value 10
in the event object.
Query string parameters are appended to the end of the route. These parameters cannot be defined in the route as positional elements (without the key name) because they are not part of the optional arguments for the route definition (if we used positional arguments we would not know the key name). If you are using SES urls with the urlDelimiter
property set to something like /|/|/
then the output for buildRouteUrl("music", "catId=1234", "offset=5|limit=5")
would be (using the name / value pairs like buildUrl())
does:
/music/1234/offset/5/limit/5/
Or using normal query string parameters using urlDelimiter
of ?|&|=
:
/music/1234/?offset=5&limit=5
It is possible to drop the index.cfm
when building URLs if you add URL rewriting at the webserver level. This is an issue that Mach-II itself cannot solve directly because it's the job of the webserver to pass the request over to your CFML engine. Without a .cfm
file in the URL (without URL rewritting), the webserver does not pass the request to your CFML engine. So you will need to use a rewritting utility to at the minimum get the request to the CFML engine.
Below is how a route is defined in the Mach II configuration file. You can configure as many routes as you like for your application by adding another parameter to the configuration of the new UrlRoutesProperty
.
Simple Route Definition Syntax
<property name="routes" type="MachII.properties.UrlRoutesProperty">
<parameters>
<parameter name="product">
<struct>
<!-- optional defaults to the current module -->
<key name="module" value="home" />
<key name="event" value="showProduct" />
<!-- optional defaults to the name of the route (in this case "product") -->
<key name="urlAlias" value="p" />
<!-- optional -->
<key name="requiredParameters" value="productId" />
<!-- optional -->
<key name="optionalParameters" value="title:'',displayType=simple>" />
</struct>
</parameter>
</parameters>
</property>
Verbose Route Definition Syntax (using arrays):
<property name="routes" type="MachII.properties.UrlRoutesProperty">
<parameters>
<parameter name="product">
<struct>
<!-- optional defaults to the current module -->
<key name="module" value="home" />
<key name="event" value="showProduct" />
<!-- optional defaults to the name of the route (in this case "product") -->
<key name="urlAlias" value="p" />
<key name="requiredParameters">
<array>
<element value="productId" />
</array>
</key>
<key name="optionalParameters" >
<array>
<element value="title:''>default" />
<element value="displayType:simple" />
</array>
</key>
</struct>
</parameter>
</parameters>
</property>
There are a few parameter names that are "reserved" which are detailed below, however, each route is configured inside a parameter as a struct. The parameter name indicates the key which you will reference when calling buildRouteUrl()
. Each route is made of a structure with some required and optional keys. The route name is the value defined as the parameter name.
Key names for routes:
Key Name | Required | Default | Description |
---|---|---|---|
module | optional | defaults to current module | Defines the module name in which the event exists for this route. |
event | required | n/a | Defines the event name in which the route maps to. |
urlAlias | optional | defaults to name of route | Defines an alias to be used when building the route otherwise route name is used. |
requiredParameters | optional | n/a | Defines a comma separated list or nested array of required arguments that must be passed when building the route. The order of the required args defines the position of the arg in the route and can also define a default value to be used if it is not passed into buildRouteUrl() . |
optionalParameters | optional | n/a | Defines a comma separated list or nested array of optional arguments that must be passed when building the route. The order of the optional args defines the position of the arg in the route. Optional arguments must define a default value (using the : syntax) so if the argument is not passed into buildRouteUrl() there is an optional value to use. Use the > syntax to indicate an URL parameter formatter. Default values for optional parameters can be Mach-II expressions (i.e. ${properties.somePropName} ) however they cannot use event data (i.e. ${event.someEventArg} ) as these defaults are pre-computed when the framework starts up. Use two single-quotes to denote an empty string (such as title:'' ) however when defining actual string values do not supply the single quotes (write as title:some awesome title ). If you do supply single quotes (such as title:'some awesome title' ) then the event-arg will contain these quotes surrounding the string. |
URL route parameters that are strings contain characters that are not always compatible inside a URL. Some incompatible characters are spaces, punctuation and foreign accent characters. One solution is to strip and replace these incompatible characters before passing them to the buildRouteUrl()
method.
By default, Mach-II 1.9+ ships with a default URL parameter formatter. The default formatter performs the following operations:
- Removes all punctuation from the string
- Replaces all spaces with
-
dashes (any double/triple spaces are replaced with a single dash) - Replaces all foreign "accent" characters with equivalent ASCII characters (
è
accent grave is replaced withe
)
You indicate you want to use a formatter by using the >
pipe syntax:
<key name="optionalParameters" value="title:''>default" />
This indicates you want to use the default formatter.
You can create your own formatters by extending the MachII.framework.url.AbstractUrlParameterFormatter
calls. See the DefaultUrlParameterFormatter
on how to design your formatter.
You define your formatter like this:
<property name="routes" type="MachII.properties.UrlRoutesProperty">
<parameters>
<parameter name="urlParameterFormatters">
<struct>
<key name="super" value="path.to.your.formatter" />
</struct>
</parameter>
</parameters>
</property>
The URL route property will write out Apache rewrite rules for you:
<property name="routes" type="MachII.properties.UrlRoutesProperty">
<parameters>
<parameter name="rewriteConfigFile">
<!-- Creates file with Apache Rewrite rules for the routes so you can exclude index.cfm -->
<struct>
<key name="rewriteFileOn" value="true|false" />
<key name="filePath" value=".htaccess" />
</struct>
</parameter>
... route parameters ...
</parameters>
</property>
Key names for rewrite config file:
Key Name | Required | Default | Description |
---|---|---|---|
rewriteFileOn | optional | false | Turns on the rewriting file on or off. |
filePath | optional | false | By default uses by running ExpandPath() on rewriteRules_base.cfm for base applications and rewriteRules_moduleNameHere.cfm . If you choose to provide a file path, ExpandPath() will be run on the path.` |
Before we get started, it's important to remember that route names cannot be used to announce a route via a browser URL bar. All routes are accessible via the URL alias value only. Route names are only used internally in the framework for ease of use, whereas URL alias values tend to be more complicated as aliases are usually stuffed with keywords for search engine optimization. If you do not define a URL alias when defining a route, the route name value doubles as the URL alias value (i.e. the route name and URL alias values will be the same). We recommend always defining a URL alias and use the route name as a simple key to look up the route.
Since URL aliases are usually stuffed with keywords, we recommend a few suggestions when coming up aliases for your routes:
- Use relevant keywords in the URL alias that also appear in the page content and meta data (keywords, description, etc.)
- Do not use spaces or punctuation. Prefer using dashes (i.e.
-
) as a replacement for spaces over CamelCasing? your URL alias. All major search engines interpret dashes as a space. Underscores (i.e._
) are the next best option if you do not want to use spaces. Use URL parameter formatters (available in Mach-II 1.9+) to format specified URL parameters (see above on how to use). - The jury is out on whether including common words like
the
,and
,or
,of
, etc. improve or detract search engine optimization. - Your URL alias may be penalized for too many keywords (i.e. too long). Try to keep URL aliases at a reasonable length. Remember users see these URLs as well as search engines so be nice to both.
- Remember that URL routes maps to concrete event handlers so it still possible to access links via a route and normal URLs. We debated on whether or not to restrict access to plain URLs when a corresponding URL route exists, but that proved to be problematic in a multitude of ways. If you are retrofitting an older application with URL routes, any inbound links would 404 and be lost. We recommend using canonical links (i.e.
<link rel="canonical" href="/index.cfm/my-special-route/"/>
) for all pages that are accessed via a URL route. For more information on canonical links and what they can do for your site, there is a great video produced by http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=139394 Google explaining canonical links].
This was probably discussed before, but would it be easy to drop the index.cfm altogether?
Yes, you can drop the index.cfm
if you have control over URL rewriting at the webserver level. This is an issue that Mach-II itself cannot solve directly because it's the job of the webserver to pass the request over to your CFML engine. Without a .cfm file, the webserver wouldn't process the request with CFML so you need to use a rewritting utility to at the minimum get the request to the CFML engine. These functionality will be compatible with what you suggest with the just mention caveat.
It would be VERY useful for SEO issues and maintaining immaculate URLs to be able to append additional URL params as a normal query string.
As the route is built, all key values would be placed in their proper order according to the route definition, but all extra name/value pairs would be appended as a query string.
THIS: /music/?offset=5&limit=5 NOT THIS: /music/5/5/
Peter and I were discussing this last comment. We are thinking we could add a third argument to buildRouteUrl() which could contain any extra URL params you want tacked onto the end. Please comment if anyone thinks would address this issue.
I think adding extra URL params as a third argument would be perfect.
This is makes sense, what about being able to configure my urlDelimiters to be able to do:
/music/?offset=5&limit=5
OR
/music/5/5/
maybe I set it as:
<property name="urlDelimiters" value="/|/|/|?|&|=" />
<property name="urlDelimiters" value="/|/|/|/|/|/" />
Another thought - With : as the delimiter for the default value. What if I have this:
<property name="moduleDelimiter" value=":" />
And I'm doing this:
<key name="optionalArgs" value="returnEvent:adminConsole:showLogin" />
Actually, if you are using the query string parameters for additional parameters that are not part of the route definition we won't be able to support /music/5/5/
unless you defined offset
and limit
as optional parameters for the route otherwise we won't know what 5
and 5
mean (could be keys called foo
or bar
). If the route is this:
<struct>
<key name="event" value="showMusicCategory" />
<key name="requiredArgs" value="catId" />
</struct>
buildRoute("music", "catId=1234")
Would output:
/music/1234/
But if you add in query string parameters buildRoute("music", "catId=1234", "offset=5|limit=5")
, we can't use positional elements (without the key name) for the query string parameters because they aren't part of the optional arguments for the route definition. The output would either be:
/music/1234/offset/5/limit/5/
Or using normal query string parameters:
/music/1234/?offset=5&limit=5
Does that make sense? I know what you are getting at regards to the url delimiters, however if you have them set to /|/|/
then you would just get /music/1234/offset/5/limit/5
as the route output or if they are set as ?|&|=
then the output of the route would be /music/1234/?offset=5&limit=5
.
Kurt, do we need to setup a way to configure the route delimiter when building the route output? I don't know if that is necessary. I think it just should use /
unless something different is set in the urlDelimiters
.
In regards to the module delimiter, it doesn't matter here. The :
in the optional arguments is just an indicator that a default value. The parser should take adminConsole:showLogin
as a literal until a comma is found which indicates a second optional argument.
However, I pose that you might do this:
<key name="optionalArgs" value="returnEvent:showLogin,returnModule:adminConsole" />
A SES/buildURL()
observation from my last project:
If I'm doing url rewriting at the web server so everything's http://www.domain.com/event/about.contact/
I'd love to set:
<property name="eventParameter" value="" />
And get: domain.com/about.contact/
instead of domain.com//about.contact/
.
The rewriter's regex would add the event
back in so it's domain.com/index.cfm/event/about.contact
when it hits the CF engine.
Maybe being able to set this would take care of that:
<property name="eventParameter" value="" />
<property name="urlDelimiters" value="|/|/" />
This is definitely a power user configuration and requires a robust event naming convention and/or careful rewrite rules, but would make for cleaner urls.
FYI, @anonymous 2/27/2009 Your suggestion has been implemented with the addition of the <property name="urlExcludeEventParameter" value="true"/>
in your XML configuration file. See above documentation for more information.