Design - VanAndelInstitute/virtual-slide-viewer GitHub Wiki
Overview
Aperio SVS images are TIFF files of RGB JPEG tiles
Every decent TIFF reader uses libtiff as there aren’t any reasonably good alternatives. Processing very large TIFF files is a niche use case. TIFF files store multiple images with metadata throughout the file, so extracting image metadata, thumbnail and label image requires seek or byte range requests.
Aperio SVS TIFF files store tiles with JPEG compression in RGB color space. Even if we were to use a viewer that can accurately render these, we would need to reassemble the JFIF files with the JPEG tables in the SVS page. However, we will most likely want to stitch multiple tiles together, use a viewing window that doesn’t correspond exactly to SVS tile boundaries, or resample them (e.g., for an in-between zoom level). OpenSlide already handles all these aspects for us while using libjpeg-turbo, which can be compiled with SIMD support. OpenSlide expects files, which means it won’t work unmodified with S3, so we’re using Amazon EFS for now. Note that Lambda functions mounting EFS shares must run in the same VPC, which means we can’t use Lambda@Edge.
If we were to reimplement image stitching, resampling, colorspace conversion and reassembly into JFIF ourselves, we might be able to leverage S3 Object Lambda to do this on the backend instead of in the browser. This would avoid the complexity and extra costs of EFS, VPC and DataSync.
I considered (and tested) pre-extracting SVS images into Deep Zoom format, but with 40X images containing 250-400K tiles, it’s expensive and slow, and costs much more in storage, compared to computing a few thousand Deep Zoom tiles on-the-fly in AWS Lambda.
We can’t directly mount our EFS filesystem onto the Aperio ScanScope Windows 10 workstation, so we’re using AWS DataSync to transfer files.
Client-side image viewer
OpenSeaDragon viewer w/ plugins
Ingest of slide images from ScanScope workstation into AWS
Getting SVS files from the ScanScope workstation into EFS is not straightforward. There is no way to upload to EFS using the AWS CLI like there is for S3. While EFS can be made available to on-premises system via AWS Direct Connect or VPN, mounting EFS on Windows isn’t officially supported by AWS, and there are no good NFSv4 clients for Windows. (Although the NFS clients for Windows from UofM and OpenText do technically work, the performance isn’t good enough for the scanner to write to). Complicating matters further, we can’t use any VAI infrastructure apart from the scanner, ScanScope workstation, and secure internet connection to pathology system, else it would be difficult to justify that it’s just an instrument (for FISMA purposes). So, research storage, HPC, and VM cluster are off-limits. AWS DataSync works, but it’s a complicated setup and adds quite a bit of latency.
As for using S3 for uploading SVS files instead, please see the limitations described above.
Dependencies
Frontend
- Image viewer: OpenSeaDragon
- UI:
- Framework: React
- Component library: Material-UI
- React Table (headless table utility)
- Data client: Apollo GraphQL Client (React)
- Auth client: AWS Amplify
- Build tools: Snowpack, rollup, babel, eslint
Backend code
- OpenSlide Python
- Pillow-SIMD
- libdmtx
- Boto3 (AWS Python SDK)
Backend infrastructure
- Metadata: AWS AppSync w/ DynamoDB
- API: API Gateway w/ Lambda for:
- generating requested SVS image views in real time
- processing slide file transfers and deletes
- Slide storage: EFS
- Requires VPC, endpoints, security groups
- Web: CloudFront, S3, Route 53, ACM
- Authentication: Cognito user pools (w/ federation)
- Authorization: IAM roles and policies Deploy: SAM, CloudFormation
File uploading
- DataSync
Access control
Read access to slide data and metadata is controlled by the consuming information system. In the absence of an access control integration mechanism (e.g., an OAuth authorization server or AWS temporary credentials), all read access is public.
Authentication for write access is implemented via Amazon Cognito on the backend. If a user attempts a restricted (i.e., write) operation, the frontend will then direct the user to authenticate. API Gateway and AppSync APIs are configured to use AWS IAM authorization for consistency with other AWS services (namely S3).