UploadingDocuments - Huddle/huddle-apis GitHub Wiki
Uploading documents via the Huddle API is a multi part process that follows these steps
- Create a new document, or a new document version
- Upload binary content to the upload uri advertised by your newly created document resource. Ideally, you should use the resumable uploads mechanism to split the upload into smaller, manageable bytes.
- Once all of the bytes have been sent (and received), the client should poll for progress, waiting for an indicating that the upload is complete.
- If the upload failed, you should consider clean up actions. For new documents, one recommendation is to delete the document that was created and start the process again.
Once you have created either a new document, or a new document version, you can upload binary content.
If the document is locked by someone else, then this operation will fail with a 409 Conflict.
The response to a successful create version, or create document request will contain a Link with a @rel value of upload.
Parse the body to find this link and extract the @href property. To upload data,
construct a multipart/form-data
request and POST it to the upload uri.
In order to make uploads more resilient, binary content can be uploaded in arbitrary, sequential byte ranges in individual requests. That way if one request fails, only that range needs to be uploaded again, rather than failing the upload and having to start again from scratch.
The range of the current chunk is specified in the Content-Range
header with the format
bytes {start}-{end}/{total}
. For example:
Content-Range: bytes 50-100/500
On successful processing of the chunk, the server will respond with 202 Accepted containing the uploaded range:
HTTP/1.1 202 Accepted
Location: https://api.huddle.net/uploads/abc
Content-Length: 0
Range: 50-100
The chunks must be uploaded sequentially, i.e. the range of the first chunk must always start at 0 and subsequent chunks must start where the previous chunk ended. An upload is considered complete when the whole content size has been received.
Since the chunk size is arbitrary, it is possible to upload the whole content
in one request simply by specifying the whole range, e.g. Content-Range: bytes 0-500/500
.
It is also possible to vary the chunk size during an upload, for example to account
for varying connection quality.
In the event that uploading a particular range of bytes was unsuccesful, you can safely retry the operation.
When the final set of bytes have been received, the upload will go into a 'processing' state on the server, and if you need to know when this is complete, you can poll for progress. The 202
response will include the link to the progress endpoint.
Huddle currently use multipart/form-data
to encode uploaded documents. This allows us to include mime-type and filename metadata inline with the binary content.
The multipart/form-data
media type is covered by rfc2388.
If you are working in a browser-environment, you may wish to create an HTML form to perform the upload. Most languages will have some library support for Multipart form data but this article assumes you are writing your own.
A multipart request contains several "chunks" of data, separate by a boundary. To create a multipart request, first create a boundary-string that is unlikely to appear in the file content. This boundary string does not have to be unique, but it must not appear elsewhere in the request body.
These examples are illustrative only, and do not represent best, or even sensible, practice.
string createBoundaryString() {
return "my_upload_client_boundary" + Guid.NewGuid();
}
Next, use this boundary string to set your content-type header
string MakeContentTypeHeader(string boundary) {
return "Content-Type: multipart/form-data; boundary=" + boundary;
}
You can now use your boundary string to construct the request. A successful request must have the following form. Note the leading dashes on both the openning and closing boundaries, but the trailing dashes on the closing boundary only. We've found that constructing an HTML form is an excellent way of exploring the multipart request format when debugging uploads.
POST /uploads/abc HTTP/1.1
Host: api.huddle.net
Content-Type: multipart/form-data; boundary=[boundary-string]
Content-Length: 288
Content-Range: bytes 0-123/1000
--[boundary-string]
Content-Disposition: form-data; name="content"; filename="tps-report-may2010.doc"
Content-Type: application/msword
[BINARY CONTENT GOES HERE]
--[boundary-string]--
HTML forms natively use multipart/form-data
when uploading files. This makes them ideal for integrating with the Huddle API.
to upload via an HTML form, create an <input type="file" />
element with the name content. Your form must have a @method of POST
and an @enctype of multipart/form-data
When you create a document, see Create a document, create a new version, see document Creating a new version of a document, or update a document's title, see Updating document title and description - you provide us with a title that we will use as the base filename, in the Content-Disposition header when you download the document, see Downloading Document Content.
Note that this is assumed to be the base filename and does not include the extension part of the filename.
When you upload a document, see Uploading Documents you provide us with a Content-Disposition from which we read the filename extension, which we will use as the extension in the document's filename in the Content-Disposition header when you download the document.
We read the mime type from the Content-Type that you provide in the Content-Type when uploading the document, and provide that to you in the Content-Type header when you download the document.
So for example if you create a document with
POST /folders/12345/documents HTTP/1.1
Host: api.huddle.net
Authorization: OAuth2 fooglybooglynooglybeep
Content-Type: application/vnd.huddle.data+xml
X-Upload-Content-Type: application/msword
X-Upload-Content-Length: 54126
<document title="My File" description="Important Stuff" />
and then upload the content with
POST /uploads/abc HTTP/1.1
Host: api.huddle.net
Content-Type: multipart/form-data; boundary=[boundary-string]
Content-Length: 288
Content-Range: bytes 0-123/54126
--[boundary-string]
Content-Disposition: form-data; name="content"; filename="Foo.doc"
Content-Type: application/msword
[BINARY CONTENT GOES HERE]
--[boundary-string]--
We will take "My File" from the title and ".doc" from the upload Content-Disposition filename and combine in the Content-Disposition when downloading
HTTP/1.1 200 OK
Content-Type: application/msword
Content-Length: 198
Content-Disposition: attachment; filename="My File.doc"
Hello - this is the content of your file.
A common mistake is to supply us the extension as well and the base filename in the Title when creating the document. So it is an likely to have unintended consequences if you create a document as follows:
POST /folders/12345/documents HTTP/1.1
Host: api.huddle.net
Authorization: OAuth2 fooglybooglynooglybeep
Content-Type: application/vnd.huddle.data+xml
X-Upload-Content-Type: application/msword
X-Upload-Content-Length: 54126
<document title="My File.doc" description="Important Stuff" />
and upload the content as follows
POST /uploads/abc HTTP/1.1
Host: api.huddle.net
Content-Type: multipart/form-data; boundary=[boundary-string]
Content-Range: bytes 0-54126/54126
--[boundary-string]
Content-Disposition: form-data; name="content"; filename="My File.doc"
Content-Type: application/msword
[BINARY CONTENT GOES HERE]
--[boundary-string]--
then we will add the extension to the file, but as the base filename already has what appears to be an extension, it will appear 'doubled up' in the Content-Disposition filename
HTTP/1.1 200 OK
Content-Type: application/msword
Content-Length: 198
Content-Disposition: attachment; filename="My File.doc.doc"
Hello - this is the content of your file.
An upload can be a long-running operation.
To retrieve the upload progress, GET the upload URI (or use the URI returned in the Location header when you send the bytes). You can poll this endpoint, waiting for an indication that the upload is in a complete state. To determine completion, the best way is to wait until the response has an 'uploadStatus' of Complete
. Note though, Error
and Cancelled
are also terminal states for the upload operation.
In cases where the upload is for a document and is complete, the resource will include a link to get the uploaded document content.
When no information for the upload progress is available, the following message will be returned: 'Sorry, progress information is unavailable at the moment'. It is likely this is intermittent and subsequent calls will provide progress.
GET /uploads/123 HTTP/1.1
Accept: application/vnd.huddle.data+xml
HTTP/1.1 200 OK
Content-Type: application/vnd.huddle.data+xml
<documentUploadProgress xmlns="http://schema.huddle.net/2011/02/">
<link rel="self" href="/documentuploads/123" />
<link rel="documentContent" href="/documents/123/content" />
<totalBytes>987123</totalBytes>
<bytesWritten>963699</bytesWritten>
<bytesRemaining>23424</bytesRemaining>
<estimatedSecondsRemaining>69</estimatedSecondsRemaining>
<percentageComplete>95</percentageComplete>
<documentStatus>Processing</documentStatus>
<uploadStatus>Uploading</documentStatus>
<message />
</documentUploadProgress>
<documentUploadProgress xmlns="http://schema.huddle.net/2011/02/">
<link rel="self" href="/documentuploads/123" />
<link rel="documentContent" href="/documents/123/content" />
<totalBytes>987123</totalBytes>
<bytesWritten>963699</bytesWritten>
<bytesRemaining>23424</bytesRemaining>
<estimatedSecondsRemaining>69</estimatedSecondsRemaining>
<percentageComplete>95</percentageComplete>
<documentStatus>Processing</documentStatus>
<uploadStatus>Uploading</documentStatus>
<message />
</documentUploadProgress>
Name | Description |
---|---|
totalBytes | Size of the upload in bytes. |
bytesWritten | Approximate number of bytes already uploaded to Huddle. |
bytesRemaining | Approximate number of bytes awaiting to be uploaded to Huddle |
estimatedSecondsRemaining | Estimate of the remaining time to finish the upload. |
percentageComplete | Estimate of what percentage of the document has been uploaded to Huddle. |
documentStatus | Complete = the document is not processing |
uploadStatus | Pending = the upload has been initiated but we have not received any bytes yet; Uploading = the upload is in progress, bytes have been recieved and the server is waiting for the next bytes; Processing = all bytes have been received and we are now processing the file; Complete = the upload and subsequent processing is complete; Cancelled = the upload has been cancelled; Error = a processing error occurred; Expired = The upload has not received any bytes within an expected period of time and has therefore presumed to be aborted. Note, this value may not be included in the reponse until ready for GA. |
Name | Description | Methods |
---|---|---|
self | The current URI of this document upload. | GET |
documentContent | If the upload is complete then this link will give access to the Document content. | GET |
start = documentUploadProgress
documentUploadProgress = element documentUploadProgress {
link+,
element h:totalBytes {xsd:unsignedLong},
element h:bytesWritten {xsd:unsignedLong},
element h:bytesRemaining {xsd:unsignedLong},
element h:estimatedSecondsRemaining {xsd:unsignedInt},
element h:percentageComplete {xsd:unsignedInt},
element h:documentStatus {"Complete"|"Error"|"Moving"|"Processing"},
element h:uploadStatus {"Pending"|"Uploading"|"Processing"|"Complete"|"Cancelled"|"Error"|"Expired"},
}
To cancel an upload, perform a HTTP DELETE on the link with @rel value of upload advertised on the Document resource. This will delete the document version metadata that was created as part of the first stage of the UploadingDocuments process.
DELETE /uploads/123 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/vnd.huddle.data+xml
If a upload is already complete and you send a delete request, you will receive a 409 Conflict response.