REST Web‐Services - ARCAD-Software/AFS GitHub Wiki

HTTPS Activation

This article describe how to configure the Security layer over HTTP (i.e. HTTPS) two access to the AFS Web-Services through a encrypted, secured, connection.

To be able to activate HTTPS you need to install a private certificate. To generate this certificate you have 3 options:

Generate a certificate with a Certificate authority (CA).
Use the automatic Let's Encrypt certificate generation.
Generate a home made certificate, so called self-signed certificate.

The first one is recommended for production installation, these certificates require a process that can take a few days and require to the client to authenticate himself to the Certificate Authority. Let's Encrypt certificate can be automatically generated and are free of charge, but they have a short validity, and require that your server is reachable from internet. At last, self-signed certificate are less secured and need to be manually "accepted" by the client RCP, and this may be a seen as a security threat.

Server Side Configuration

1. HTTP/S Server configuration

First of all for any solution chosen you have to complete the configuration of the "HTTP Server", from the configuration file :

Define the public "domain name" as it will be typed by the users from the RCP. If you plan to use a self-signed certificate it may be a local domain name (e.g. the server hostname as set by default), otherwise you have to set a internet DNS routed to the current server.
Set the TCP port that will be used for HTTPS protocol. This port number must be different from the HTTP Port, the standard HTTPS port number is 443, you can use it if your server allow it.

When the HTTPS server will be configured you will be able to shut down the HTTP server by setting zero (0) as HTTP port. But do not do it now or you will be unable to configure and test the HTTPS connection.

2. Certificate Installation

Once the certificate generated by the Certificate authority (CA), you have to store it into a Key Store. To do so you can use several open source or free software (like KS Explorer).

The KeyStore need to be accessible from the Server file system. You can install it by following this process:

Stop the server and edit the configuration file ./configuration/osgi.cm.ini.
There, you can activate the secured layer, see previous chapter...
Set the file path to your Key Store, the password required to open it, its type (JKS, P12 or any type supported by the Java VM).
Add the alias of your private key and its associated password.
Then save the configuration modification and restart the server.

The server restart in HTTPS mode. You should be able to reconnect to with a https:// URL. In production environment, it is recommended to turn down the HTTP mode by setting a zero (0) port number to it.

Client Side Configuration

From Client side (RCP) there is not required operation if you use a CA or Let's Encrypt generated certificates. These certificates are automatically recognized by Java as valid certificate. Self signed certificate are not, you will have to manually authorize these kind of certificate on each client hosts.

If you use a self generated certificate you need to install it into the Client trust store. You can accept the certificate at the first connection. Take care are about accepting un-trusted certificate into a non-secure environment (like internet or any where a risk of network hack is possible, like DNS cache poisonning for instance).

The first time RCP application tries to connect server on secured-connection, user is proposed to install certificate:

either by accepting the certificate is currently offering with the underlying hacking risks.
either by loading a certificate from file. This file have to be brought to you by the administrator that generated the server certificate (e.g.: the "server_public.crt" generated by the server.)

This step will be repeated if certificate becomes obsolete.

Long duration operation web-services

This chapter address the Long Operations Governance, or LOG in short. It offer some implementation hints from the Web-service perspective. The current implementations offer a Pull mechanisms which require that the HTTP client explicitly call the web-services to obtain an accurate information about the state of the long term operation.

Important Note: Do not confuse LOG and the Logging, the logging function is a way for an application to record events that may, or may not occurs allowing to diagnose a problem, to do so a global view of the system is required and as LOG may generate logging as the LOG communication do not aim to the same destination, the LOG must provide a progression information about the operation along with estimation of the normal ending of it, these are predictable messages relate to a normal and deterministic execution, these are nothing to do with unpredictable error messages... mixing logging and LOG will confuse the end-user with unreadable event and may have a really poor diagnostic on the server state because the other module running on it, do not merge their logging with this operation.

When dealing with a REST web-service the time available to generate the response is limited, due to the HTTP protocol. To manage any operation which the computation of the result may exceed 2 seconds, you have to offers an alternative to the requester. Just like the whole REST API logic, you have to respect the HTTP protocol. The primal implementation of this pattern is the provide to the user a "ticket" offering him access to another service which will indicate that the operation is still in progress. By the way generating the ticket will be a POST request and requesting information about the corresponding progress will be a GET.

The following implementation propose to create services with this model: a parent resource will accept POST method to generate a new operation each time it is called, in return the result will include an identifier that will be used to call a child resource (i.e. parent/{identifier} ) to manage the running operation. This approach requite that the HTTP client call the child resource repeatedly to get an accurate progression information, this may lead to a overload of the HTTP server if there is too many operations running at the same time. Some benchmark test may be required here...

From the operation implementation perspective AFS use the IProgressMonitor class from Eclipse Equinox. This class allow to follow the progression of the operation, with a kind of percentage of completion and sub-task labels and allow a user cancellation which respect the execution of the operation (i.e. basically this cancellation may allow to the operation implementation to cancel the current work quietly).

IProgressMonitor usages

AFS offer multiples implementation of the Equinox interface. Even is they are not directly related to this topic, they may help to test and use the exact same operation implementation that the one used in the following web-services.

You will find more information about the usage of the IProgressMonitor here and here.

The com.arcadsoftware.osgi bundle offer the following classes in the, same named, com.arcadsoftware.osgi package:

ConsoleProgressMonitor: Allow to redirect the progression information into the Equinox OSGi Console.
LoggedProgressMonitor: Is an implementation which use the aggregated log to store the progression message, note that this monitor may be wrapped around any other monitor.
SysOutProgressMonitor: use a terminal shell to print the progression of the operation, with a progress bar, usable by CLI programs.
TimerProgressMonitor: may be used with another monitor, this implementation add a time information about the current operation, with a start date and an estimation of the end date.

Map operation progression to Metadata entity

The bundle c.a.metadata offer a IProgressMonitor implementation which allow to store the progression information into a Metadata Entity (two in fact). This approach allow to permently store the progression and the task labels of the execution, which may cause storages problem if there is too many operation like this !

The class EntityProgressMonitor use 2 Entities to store the progression log, the whole configuration of this implementation is done through these entities:

A "Header" Entity: which will be used to store the current state of the operation, this include the following attributes:
1. state: which may be an Integer (may be a reference to a list of states: 0 = waiting, 1 = running, 2 = terminated, 3 = cancelled), or a String, and in that case its value is the localized state code.
2. startdate: a Date the moment of the actual starting of the process.
3. duration: a Date, used to store the estimated ending date during execution, or the actual ending date when the process is done, or an Integer, storing the duration in millisecond of the process (estimation during execution, actual duration when done).
4. progress: a Integer, with a value between 0 and length (100 by default), indicating the actual progression. If the length value of this attribute is negative or null, the value is a percentage.
5. stage: a String storing the latest task (or sub task) name.
A "Details" entity: is used to log the progression of the operation, it use the same attributes as the Headed does, plus:
1. header: an integer which is a reference to the Header data.
2. substage: a boolean which indicate that the "stage" text is a sub-task message, or a String which will be used to store the sub-task message itself. In that case, the "stage" attribute always contains the global task message.

All of these attributes are optional. if none are present, or if the entity is not set, then the corresponding information will be dropped. If your entities use different attributes name, each of them must use the metadata "replace" with the name of the replaced attribute. For instance if in your entity the attribute "endDate" is used to store the estimated data of the end of the process, define a metadata tag "replace" with the value "duration". These strings are case sensitive.

The Header data must be created before to use this class. Only one data will be used during the operation, it will be updated each time the progression change. The details data are created (and linked to the header data) during the process. A process may produce zero to many details data, so a way to clear these data must be provided to the end-users.

To use this class you just have to create a new instance with one of the creation methods, according to your usage, with or without a header, a detail entity... and then use it as a normal IProgressMonitor. For instance, this implementation may be used directly into a OSGi event thrown during the creation of the header entity object, you just have to use an asynchrone event, get the header identifier from the vent properties, create the monitor and execute the operation. The monitoring is then done through the Metadata Entities web-services.

Use a dedicated Restlet implementation

The c.a.rest bundle offer another implementation of this pattern with a Restlet class: ProgressMonitoredRestlet. This class is used to be extended and mapped to a dedicated end-point. It take in charge the record of the progression during a certain lap of time, without using any storage facility. This implementation may surcharge the application memory if there is too many operation run in a short among of time. It does not use any other AFS facilities, this is a pure independent web-service. It is completly agnostic, and only manage the information related ot the progression of the operation, there is no result transmitted through this web-service, no semantic associated to the operation type.

The class will accept POST and GET call on this end-point, a POST will generate the execution of a new operation, and the GET will allow to get information about running operations. It also accept call on the child path /id, and accept the concatenation of multiples identifier, like /id+id+id+..., where id is the identifier of a previously launched operation. The method GET and DELETE are accepted, the DELETE method is used to cancel the operation. The GET methods retrieve the current state of the operation progression the information can be transmitted if XML or in JSON. The XML Schema can be obtained directly from the web-service. The JSON format is the following:

A GET call on the parent path, or with a concatenation of identifier will return a JSON array.
A GET with a single identifier will return a JSON object with the following keys :
- id: is the numerical identifier of the corresponding operation.
- percent: is an integer value form 0 to 100, representing the current progression of the operation.
- cancelled: is a boolean value indicating that this operation has been previously cancelled.
- startdate: is an ISO, GMT based, date representation of the actual start date of the operation, if may be not included if the operation is not started yet.
- enddate: provided only is the operation is started, if the operation is running this date is an estimation of the end date, if the operation is ended this is the actuel end date.
- ended: if a boolean, provided only if the operation has been started, equal to true if the operation is terminated. Note that a cancelled operation may still be in progress depending on the implementation of the programs.
- tasks: is a JSON array of objects with the following keys, note that depending of the configuration of the web-service only the lasted sub task label may be included into this array:
  - name: the label if the task.
  - percent: the percentage of the progression of the global operation when this task was started.
  - startdate: the start date of the task, note that a task end when the next task start.

To use this class you just have to set it in a REST branch just like other resource, You will just have to implement the startOperation abstract method. This method pass the paramters sent to the web-service, through an URL Encoded Form, the current user and the newly created monitor. in the this method you will have to :

Launch the actual operation execution, using the given monitor, in another thread. To do so you can use the delayedRun method, it ensure that the operation is started after a small delay.
return the unique identifier associate to this operation. This numerical identifier, must be a positive integer, it can be associated to the result of the operation or any semantic you what, you will have to provide the Metadata Entity or the web-service corresponding to it somewhere else. Or you can use the getNewId() method. This method generate a new unique number each time it is called. if the method return a null or negative identifier then the operation is removed and an HTTP error code 503 is returned.

This is the minimal implementation required to execute these operation. But you can also override the following methods:

onDelete to check if the current user is able to cancel the operation. By default only the user which have launched the operation can cancel it.
onRead to check if the current user can see the state on the given operation. By default a user can only see the operation he launched himself.
getCacheduration to change the duration (in milli-seconds) of the cache used to store the ended ProgressMonitor. A null or negative value will disable the purge.
isKeepTaskTrace to indicate if all the tast or only the current/last task label must be recorded and communicated. The default value is true, all the task are recorded.
isSortTasksAscending to define the order to present the task into the returned documents, the default value is false, the task are presented in the chronological order, the newer first.
getOperationXMLTag to change the default "operation" tag name in XML documents.
getOperationListXMLTag to change the default "list" tag name in XML documents.

Note that the ProgressMonitoredRestlet class implement the Closeable, the method close() should be called when the Restlet is detached from the branch to avoid resources leak.

The "Web" Bundle

Or How to integrate a static web-site into the REST oriented web-service HTTP Server...

The AFS bundle com.arcadsoftware.server.web implement a simple mechanism that allow any fragment to serve static web pages under the Web-Service HTTP Server.

To do so the fragment just nee to be a fragment of this bundles, with the following line in its MANIFEST.MF:

Fragment-Host: com.arcadsoftware.server.web;bundle-version="[1.0.0,2.0.0)"

And include a /web folder. Any sub-folder will be accessible through the HTTP server. So any file like: /web/tests/test.html will be accessible through an URL like: http://localhost:5252/tests/test.html

Default web-root folder and sub-folders:

If multiple fragment are attached to the com.arcadsoftware.server.web bundle, or if the /web folder contain multiples sub-folders. You can define one of these sub-folder as the default "webroot', to do so add the corresponding property to the OSGi configuration. If you don't, the first loaded sub-folder will duplicated as the / URL. To avoid this set an empty value to the configuration property.

You have to be careful with the sub-folder names, and the root sub-folder content, as they may interfere with the other Web-services.

If you what to integrate the static bundle as a sub-path of an existing web-service you have to name your folder just like the web-service path, plus the sub-folder name, all folder names separated with dot (.) instead onf slash (/).

For instance, let say that there is a classical web service named /test, and for some reasons I want to integrate a static graphical interface for this service to the URL /test/ui/. If a put my static pages in the folder path:

 /web/test/ui/
              index.html

The "test" sub-folder will be in routing conflict with the /test web-service ! To do so I have to create a sub-folder named test.ui, as follow:

 /web/test.ui/
              index.html

Here there will not have any static folder "test" routed only a "test/ui" path, and there will not have any routing conflict.

OSGi Configuration:

The configuration of the Server Web bundle is located in the file com.arcadosftware.server.web.cfg or in the section [ Web ] section of the file osgi.cm.ini.

There is three parameters:

disabled (Boolean): Set this parameter to true to completely disabled the web resources access on this server. Default value is false.
secure (Boolean): If this parameter is set to true, the static Web resources is put in the web-server part, which requires a user authentication before to access to the resources. As the Web front-end program generally offers in the interface to manage the authentication of the user (i.e. a connection dialog), the static resources themselves must be accessible without a user authentication. It depends on the way the web front-end program is implemented and should not be modified in the production environment. The default value is false.
webroot (String): Generally, the first folder defined as a container of static web resources is defined as the default web path (i.e. the root in the URL path, "/"). If more than one folder is defined, this default path may be randomly selected depending on the installation. Setting an empty value to this parameter disable the "root path" of any static web resource container.

Remarks:

The static resources are by default attached to the "not secured" part of the web-service, i.e. an authentication is not required to access to theses resources. But you can switch to the secure branch by setting the configuration property (see Web Pages bundle configuration).
As the sub-folder name is used as a part of the path of the URL, it may conflict with other web services so this folder name must be unique.
If you define a "webroot" folder (see Web Pages bundle configuration) the conflict name will concern all the static resources of this folder.

Errors management:

If any HTTP error is thrown during the serving of the static pages, any request addressed to a sub-folder related to the static pages, a error page can be served instead of the default HTTP response. To do so add an .html or an .xhtml file in the sub folder of the /web folder in the bundle fragment, with the corresponding error code ad name.

In reality, only the error 404 may be routed like this, so you can only add a page 404.html or 404.xhtml. (The current implementation can only throw an 500 error if the OSGi platform does not have access to the file system, or an 401 error,but this one is thrown by other bunldes, before the route if resolved by the com.arcadsoftware.server.web bundle.)

These files must use the UTF-8 charset and only HTML or XHTML mediatype are currently supported.

For instance if the bundle fragment contain the files:

/web/test/index.html
/web/test/404.html

A call to the url: http://localhost:5252/test/test.html will serve the content of the 404.html file.

HTTP Query Length Limit

This article address the query size limit into Restlet Web-services. Depending on the Client and Server component used to access to REST Web-service there could be come limitation on the HTTP Header request, more precisely on the URL length.

Such limitation does not exist in the HTTP protocol itself (see RFC7230). But this is a limitation that may be imposed by the HTTP Client, Server or even a Proxy or an Antivirus that parse the HTTP requests. The HTTP specification recommend that at least a size of 8000 bytes should be accepted. For instance the Jetty default configuration is 8kB.

Note that in our current configuration (03/2019) we have a limit of the Header size set to 16kB, with the Jetty implementation. But this limitation can change, according to the server charge or any other proxy.

Some test has proven that there is no hardware limitation under 256kB... By the way, a limit set to high may allow DDOS attack or charge the server when it is allocating the buffer to process to a request, this is not recommended to increment it. As this limitation may arise from third product (like a proxy !, or a new version of Jetty), we have added another option, on the client part, to automatically put the query part of the URL into the body of the request, where the size limit is, in theory, higher. Then the actual request method will be changed to POST transferred like this to the Restlet server where it will be changed into the original GEt request (If and only if the requested web service implement the AFS helper classes).

This mechanism is not standard, as a GET request should not have a body, and a POST method does not have the same semantic. Moreover the automatic transformation of a GEt into a POST only for encapsulation is a specific function of AFS.

Server Request Size Limit:

To change the defaut HTTP request header size limit, set the following system property into the config.ini file of the server:

com.arcadsoftware.httprequestheaderlimit = 16384

Note that it is not recommended to go over 128kB.

Client Request Query proxying:

To set the threshold that transform a GET+Query request into a POST+Body one, you can set the following system property to the desired limit:

com.arcadsoftware.httpquerylimit = 15000

where "15000" is the limit in byte of the encoded query part of the URL. A value of zero force the usage of the request body for all query, and a negative value disable this limitation, which is obviously not recommended, as forcing the usage a POST method is not correct.

This limit represent the size limit of the query part of the URL, there may be a limitation on the whole URL or even on the header total size. Remember that the "query" is not the only part of the header, so this limit must lower than the server request header size limit.

Note that to can override this limitation by setting the corresponding property of the WebServiceAccess object (with the method setQueryLimit).

HEAD Method support

All resource extending the AFS OSGiResource may provide a support of the HEAD method by setting the lastModification date property. This date will be used if the resquest headers If-Modified-Since or If-Unmodified-Since are used in the request.

Dy default a HEAD method call on a non existing resource (i.e. setExisting(false) ) will return a 404 error code or a 200 code if the resource exists. If the lastModification date is set then the returned code will depends on the conditional header used:

If the If-Modified-Since condition is validated then a 200 code is returned, and if the resource has not been modified a 304 code is returned.
If the If-Unmodified-Since condition is validated the a 200 code is returner, if not, the resource has been modified, then a 412 Error code is returned.