Application Services (SQS,SNS,SWF) - devian-al/AWS-Solutions-Architect-Prep GitHub Wiki

Simple Queuing Service (SQS)

SQS Simplified

SQS is a web-based service that gives you access to a message queue that can be used to store messages while waiting for a queue to process them. It helps in the decoupling of systems and the horizontal scaling of AWS resources.

SQS Key Details

  • The point behind SQS is to decouple work across systems. This way, downstream services in a system can perform work when they are ready to rather than when upstream services feed them data.
  • In a hypothetical AWS environment running without SQS, Application A would pass Application B data regardless if Application B was ready to receive the info. With SQS however, there is an intermediary step where the data is stored temporarily in a buffer. It waits there until Application B pulls the temporarily stored data. SQS is not a push-based service so it is necessary for SQS to work in tandem with another service that queries it for information.
  • There are two types of SQS queues;
    • Standard
    • FIFO.
  • Standard queues
    • may be received out of order based on message size or however else the SQS queues decide to optimize. FIFO queues guarantees that the order of messages that went into the queue is the same as the order of messages that leave it.
    • guarantee that a message is delivered at least once and because of this, it is possible on occasion that a message might be delivered more than once due to the asynchronous and highly distributed architecture. With standard queues, you have a nearly unlimited number of transactions per second.
  • FIFO queues
    • guarantee exactly-once processing and is limited to 300 transactions per second.
    • Messages in the queue can be kept there from one minute to 14 days (max)
    • the default retention period is 4 days.
  • Visibility timeouts in SQS are the mechanism in which messages marked for delivery from the queue are given a time frame to be fully received by a reader.
    • This is done by temporarily making them invisible to other readers. If the message is not fully processed within the time limit, the message becomes visible again.
      • This is another way in which messages can be duplicated.
      • If you want to reduce the chance of duplication, increase the visibility timeout.
      • The visibility timeout maximum is 12 hours.

Always remember that the messages in the SQS queue will continue to exist even after the EC2 instance has processed it, until you delete that message. You have to ensure that you delete the message after processing to prevent the message from being received and processed again once the visibility timeout expires.

  • An SQS queue can contain an unlimited number of messages.
  • You cannot set a priority to the individual items in the SQS queue. If priority of messaging matters, create two separate SQS queues. The SQS queues for the priority message can be polled first by the EC2 Instances and once completed, the messages from the second queue can be processed next.
Standard Queue FIFO Queue
Available in all regions Available in the US East (N.Virginia), US East (Ohio) US West (Oregon), EU (Ireland), Asia Pacific (Tokyo) regions.
Unlimited Throughput – Standard queues support a nearly unlimited number of transactions per seconds (TPS) per action High Throughput – By default, FIFO queues support up to 3,000 messages per second with batching (Can request a limit increase). FIFO queues support up to 300 messages per second (300 send, receive, or delete operations per second) without batching.
At-Least-Once Delivery – A message is delivered at least once, but occasionally more than one copy of a message is delivered. Exactly-Once Processing – A message is delivered once and remains available until a consumer processes and deletes it. Duplicates aren’t introduced into the queue
Best-Effort Ordering – Occasionally, messages might be delivered in an order different from which they were sent. Exactly-Once Processing – A message is delivered once and remains available until a consumer processes and deletes it. Duplicates aren’t introduced into the queue.
First-in-First-Out Delivery – The order in which messages are sent and received is strictly preserved.
Send data between applications when the throughput is important. Send data between applications when the order of events is important.

SQS Polling

Polling is the means in which you query SQS for messages or work. Amazon SQS provides short-polling and long-polling to receive messages from a queue. By default, queues use short polling.

  • SQS long-polling This polling technique will only return from the queue once a message is there, regardless if the queue is currently full or empty. This way, the reader needs to wait either for the timeout set or for a message to finally arrive. SQS long polling doesn't return a response until a message arrives in the queue, reducing your overall cost over time.
  • SQS short-polling This polling technique will return immediately with either a message that’s already stored in the queue or empty-handed.
  • The ReceiveMessageWaitTimeSeconds is the queue attribute that determines whether you are using Short or Long polling. By default, its value is zero which means it is using short-polling. If it is set to a value greater than zero, then it is long-polling.
  • Every time you poll the queue, you incur a charge. So thoughtfully deciding on a polling strategy that fits your use case is important.
  • SQS supports dead-letter queues, which other queues can target for messages that can’t be processed successfully.
  • Setting up a dead-letter queue allows you to do the following:
    • Configure an alarm for any messages delivered to a dead-letter queue.
    • Examine logs for exceptions that might have caused messages to be delivered to a dead-letter queue.
    • Analyze the contents of messages delivered to a dead-letter queue to diagnose software or the producer’s or consumer’s hardware issues.
    • Determine whether you have given your consumer sufficient time to process messages.
  • When to use a dead-letter queue
    • When you have a standard SQS queue, to avoid additional costs from SQS handling failed messages over and over again. Dead-letter queues can help you troubleshoot incorrect message transmission operations.
    • To decrease the number of messages and to reduce the possibility of exposing your system to poison-pill messages (messages that can be received but can’t be processed).
  • When not to use a dead-letter queue
    • When you want to be able to keep retrying the transmission of a message indefinitely in your SQS standard queue.
    • When you don’t want to break the exact order of messages or operations in your SQS FIFO queue.
  • Delay queues let you postpone the delivery of new messages to a queue for a number of seconds.

Best Practices

  • Extend the message’s visibility timeout to the maximum time it takes to process and delete the message. If you don’t know how long it takes to process a message, as long as your consumer still works on the message, keep extending the visibility timeout .
  • Using the appropriate polling mode.
  • Configure a dead-letter queue to capture problematic messages.
  • To avoid inconsistent message processing by standard queues, avoid setting the number of maximum receives to 1 when you configure a dead-letter queue.
  • Don’t create reply queues per message.
  • Instead, create reply queues on startup, per producer, and use a correlation ID message attribute to map replies to requests. Don’t let your producers share reply queues.
  • Reduce cost by batching message actions.
  • Use message deduplication IDs to monitor duplicate sent messages.

Monitoring, Logging, and Automating

  • Monitor SQS queues using CloudWatch
  • Log SQS API Calls Using AWS CloudTrail
  • Automate notifications from AWS Services to SQS using CloudWatch Events

Security

  • Use IAM for user authentication.
  • SQS has its own resource-based permissions system that uses policies written in the same language used for IAM policies.
  • Protect data using Server-Side Encryption and AWS KMS.
  • SSE encrypts messages as soon as Amazon SQS receives them. The messages are stored in encrypted form and Amazon SQS decrypts messages only when they are sent to an authorized consumer.

Simple Notification Service (SNS)

SNS Simplified

Simple Notification Service is a pushed-based messaging service that provides a highly scalable, flexible, and cost-effective method to publish a custom messages to subscribers who wish to be informed about a certain topic.

SNS Key Details

  • SNS is mainly used to send alarms or alerts.
  • SNS provides topics for high-throughput, push-based, many-to-many messaging.
  • Using Amazon SNS topics, your publisher systems can fan out messages to a large number of subscriber endpoints for parallel processing, including Amazon SQS queues, AWS Lambda functions, and HTTP/S webhooks.
  • Additionally, SNS can be used to fan out notifications to end users using mobile push, SMS, and email.
    • You can send these push notifications to Apple, Google, Fire OS, and Windows devices.
  • SNS allows you to group multiple recipients using topics.
    • A topic is an access point for allowing recipients to dynamically subscribe for identical copies of the same notification.
    • One topic can support deliveries to multiple endpoint types.
    • When you publish to a topic, SNS appropriately formats copies of that message to send to whichever kind of device.
  • SNS also logs the the delivery status of notification messages sent to topics with the following SNS endpoints:
    • Application
    • HTTP
    • Lambda
    • SQS
    • Amazon Kinesis Data Firehose
  • To prevent messages being lost, messages are stored redundantly across multiple AZs.
  • There is no long or short polling involved with SNS due to the instantaneous pushing of messages
  • SNS has flexible message delivery over multiple transport protocols and has a simple API.
  • SNS Delivery Retries
    • All messages sent to SNS are processed and delivered immediately. If a message cannot be successfully delivered on the first attempt, SNS implements a 4-phase retry policy:
      • retries with no delay in between attempts
      • retries with some minimum delay between attempts
      • retries with some back-off model (linear or exponential)
      • retries with some maximum delay between attempts

Monitoring

  • Monitoring SNS topics using CloudWatch
  • Logging SNS API calls using CloudTrail

Security

  • SNS provides encrypted topics to protect your messages from unauthorized and anonymous access. The encryption takes place on the server side.
  • SNS supports VPC Endpoints via AWS PrivateLink. You can use VPC Endpoints to privately publish messages to SNS topics, from a VPC, without traversing the public internet.
  • Using access control policies, you have detailed control over which endpoints a topic allows, who is able to publish to a topic, and under what conditions.
  • You can enable AWS X-Ray for your messages passing through Amazon SNS, making it easier to trace and analyze messages as they travel through to the downstream services.

Simple Workflow Service (SWF)

SWF Simplified

A fully-managed state tracker and task coordinator in the Cloud. You create desired workflows with their associated tasks and any conditional logic you wish to apply and store them with SWF

SWF Key Details

  • SWF promotes a separation between the control flow of your background job’s stepwise logic and the actual units of work that contain your unique business logic.
  • SWF manages your workflow execution history and other details of your workflows across 3 availability zones.
  • SWF lets you write your application components and coordination logic in any programming language and run them in the cloud or on-premises.
  • SWF is highly scalable. It gives you full control over the number of workers that you run for each activity type and the number of instances that you run for a decider.
  • SWF also provides the AWS Flow Framework to help developers use asynchronous programming in the development of their applications.
  • Workflow
    • A set of activities that carry out some objective, together with logic that coordinates the activities.
    • Workflows coordinate and manage the execution of activities that can be run asynchronously across multiple computing devices and that can feature both sequential and parallel processing.
    • Activity Task
      • An activity task tells an activity worker to perform its function.
      • SWF stores tasks and assigns them to workers when they are ready, tracks their progress, and maintains their state, including details on their completion.
      • To coordinate tasks, you write a program that gets the latest state of each task from SWF and uses it to initiate subsequent tasks.
      • Activity tasks can run synchronously or asynchronously. They can be distributed across multiple computers, potentially in different geographic regions, or they can all run on the same computer.
    • Lambda task
      • Executes a Lambda function instead of a traditional SWF activity.
    • Decision task
      • A Decision task tells a decider that the state of the workflow execution has changed so that the decider can determine the next activity that needs to be performed. The decision task contains the current workflow history.
      • SWF assigns each decision task to exactly one decider and allows only one decision task at a time to be active in a workflow execution.
    • Workflow Starter
      • Any application that can initiate workflow executions.
    • Activity Worker
      • An activity worker is a program that receives activity tasks, performs them, and provides results back.
      • Implement workers to perform tasks. These workers can run either on cloud infrastructure, or on your own premises.
      • Different activity workers can be written in different programming languages and run on different operating systems.
      • Assigning particular tasks to particular activity workers is called task routing. Task routing is optional.
    • Decider
      • A software program that contains the coordination logic in a workflow.
      • It schedules activity tasks, provides input data to the activity workers, processes events that arrive while the workflow is in progress, and ultimately ends the workflow when the objective has been completed.
      • Both activity workers and the decider receive their tasks by polling the SWF service.
    • Workflow Execution History
      • The workflow execution history is composed of events, where an event represents a significant change in the state of the workflow execution.
      • SWF informs the decider of the state of the workflow by including, with each decision task, a copy of the current workflow execution history.
    • Polling
      • Deciders and activity workers communicate with SWF using long polling.

The SWF pipeline is composed of three different worker applications that help to bring a job to completion

  • SWF Actors are workers that trigger the beginning of a workflow.
  • SWF Deciders are workers that control the flow of the workflow once it's been started.
  • SWF Activity Workers are the workers that actually carry out the task to completion.
  • With SWF, workflow executions can last up to one year compared to the 14 days maximum retention period for SQS.

AWS Step Function

  • AWS Step Functions is a web service that provides serverless orchestration for modern applications. It enables you to coordinate the components of distributed applications and microservices using visual workflows.
  • Step Functions is based on the concepts of tasks and state machines.
    • A task performs work by using an activity or an AWS Lambda function, or by passing parameters to the API actions of other services.
    • A finite state machine can express an algorithm as a number of states, their relationships, and their input and output.
  • You define state machines using the JSON-based Amazon States Language.
  • A state is referred to by its name, which can be any string, but which must be unique within the scope of the entire state machine. An instance of a state exists until the end of its execution.
    • There are 8 types of states:
      • Task state – Do some work in your state machine. AWS Step Functions can invoke Lambda functions directly from a task state.
      • Choice state – Make a choice between branches of execution
      • Fail state – Stops execution and marks it as failure
      • Succeed state – Stops execution and marks it as a success
      • Pass state – Simply pass its input to its output or inject some fixed data
      • Wait state – Provide a delay for a certain amount of time or until a specified time/date
      • Parallel state – Begin parallel branches of execution
      • Map state – Adds a for-each loop condition
    • Common features between states
      • Each state must have a Type field indicating what type of state it is.
      • Each state can have an optional Comment field to hold a human-readable comment about, or description of, the state.
      • Each state (except a Succeed or Fail state) requires a Next field or, alternatively, can become a terminal state by specifying an End field.
  • Activities enable you to place a task in your state machine where the work is performed by an activity worker that can be hosted on Amazon EC2, Amazon ECS, or mobile devices.
  • Activity tasks let you assign a specific step in your workflow to code running in an activity worker. Service tasks let you connect a step in your workflow to a supported AWS service.
  • With Transitions, after executing a state, AWS Step Functions uses the value of the Next field to determine the next state to advance to. States can have multiple incoming transitions from other states.
  • State machine data is represented by JSON text.
  • It takes the following forms:
    • The initial input into a state machine
    • Data passed between states
    • The output from a state machine
  • Individual states receive JSON as input and usually pass JSON as output to the next state.
  • Common Use Cases
    • Step Functions can help ensure that long-running, multiple ETL jobs execute in order and complete successfully, instead of manually orchestrating those jobs or maintaining a separate application.
    • By using Step Functions to handle a few tasks in your codebase, you can approach the transformation of monolithic applications into microservices as a series of small steps.
    • You can use Step Functions to easily automate recurring tasks such as patch management, infrastructure selection, and data synchronization, and Step Functions will automatically scale, respond to timeouts, and retry failed tasks.
    • Use Step Functions to combine multiple AWS Lambda functions into responsive serverless applications and microservices, without having to write code for workflow logic, parallel processes, error handling, timeouts or retries.
    • You can also orchestrate data and services that run on Amazon EC2 instances, containers, or on-premises servers.

Differences


Amazon SWF vs AWS Step Functions AWS Step Functions vs Amazon SQS Amazon SQS vs AWS SWF
Consider using AWS Step Functions for all your new applications, since it provides a more productive and agile approach to coordinating application components using visual workflows. If you require external signals (deciders) to intervene in your processes, or you would like to launch child processes that return a result to a parent, then you should consider Amazon SWF. Use Step Functions when you need to coordinate service components in the development of highly scalable and auditable applications. Use SQS when you need a reliable, highly scalable, hosted queue for sending, storing, and receiving messages between services. SWF API actions are task-oriented. SQS API actions are message-oriented.
With Step Functions, you write state machines in declarative JSON. With Amazon SWF, you write a decider program to separate activity steps from decision steps. This provides you complete control over your orchestration logic, but increases the complexity of developing applications. You may write decider programs in the programming language of your choice, or you may use the Flow framework, which is a library for building SWF applications, to use programming constructs that structure asynchronous interactions for you. Step Functions keeps track of all tasks and events in an application. Amazon SQS requires you to implement your own application-level tracking, especially if your application uses multiple queues. SWF keeps track of all tasks and events in an application. SQS requires you to implement your own application-level tracking, especially if your application uses multiple queues.
The Step Functions Console and visibility APIs provide an application-centric view that lets you search for executions, drill down into an execution’s details, and administer executions. Amazon SQS requires implementing such additional functionality. The SWF Console and visibility APIs provide an application-centric view that lets you search for executions, drill down into an execution’s details, and administer executions. SQS requires implementing such additional functionality.
The Step Functions Console and visibility APIs provide an application-centric view that lets you search for executions, drill down into an execution’s details, and administer executions. Amazon SQS requires implementing such additional functionality. SWF offers several features that facilitate application development, such as passing data between tasks, signaling, and flexibility in distributing tasks. SQS requires you to implement some application-level functionality.
Step Functions offers several features that facilitate application development, such as passing data between tasks and flexibility in distributing tasks. Amazon SQS requires you to implement some application-level functionality. In addition to a core SDK that calls service APIs, SWF provides the Flow Framework with which you can write distributed applications using programming constructs that structure asynchronous interactions.
You can use Amazon SQS to build basic workflows to coordinate your distributed application, but you get this facility out-of-the-box with Step Functions, alongside other application-level capabilities.