In tech, nothing is more important than having the right tool at the right time in the right place - especially when you are working with an agile team. Agile, at its core, is about having the means to hit your targets on time, while remaining flexible enough to accommodate an array of changing requirements as they arrive. Because of our company's need to maintain these agile capabilities, we choose to work with AWS. By doing so, we reduced the cost of our operations significantly.
In this article, we explain the AWS services that we use and highlight some of their primary benefits.
EKS (Elastic Kubernetes Service)
Let’s start with the bread and butter of any modern distributed system — Kubernetes.
Because EKS is built on top of EC2 instances, AWS can provide the master node and all the infrastructure. You only need to worry about the definitions of your services.
In our current setup, we have different namespaces for each of the stage environments. The only two things that change are the variables/secrets and the number of instances running. This way, we have a dev environment that is very similar to production. This is essential for you to debug and find that dodgy error that bothers you so much.
Another strong argument for using Kubernetes is its declarative approach to infrastructure. You declare the services that it should run, and it will do its best to run them. If for some reason the service goes down, Kubernetes will boot a new instance to fulfill the declared state.
So you might ask, “EKS helps to deploy and maintain services, but how will the different services talk with each other?”
Kinesis
With the Kinesis service, you can direct all your messages from one instance to another and have decoupled services interacting well.
This helps when client requests don’t hit the same instance, or when the processing doesn’t live in one instance and requires additional computation from other micro-services.
Another option that AWS provides for service communication is SNS and fan-out SQS (Simple Queue Service) subscriptions. You can read more on how to set it up here.
The most simple example could be to have one service processing the data, and another one archiving it. With SQS, this is not possible without subscribing the SQS to a Kinesis stream.
Also, Kinesis integrates deeply with DynamoDB and Lambda Functions, which are the next two topics.
DynamoDB
This is the go-to AWS solution for NoSQL document-based persistence. Besides the ability to store JSON objects, you can create indexes and search by them, making the fetches very performative. If, as soon you scale your application, a certain index is starting to receive a lot of usage, you can scale the resources allocated to it in the table configuration.
In our case, we have everything separated by domain entities representing the core concepts for our solution.
And the cherry on top is that DynamoDB can also pipe to Kinesis. I think you can see where we are going. This event architecture is perfect for Serverless. Heck! Serverless was developed exactly for cases like this.
Lambda Functions (Serverless)
Another hot topic in the development world. Lambda functions are receiving a lot of love these days for being very simple to develop and deploy.
In our infrastructure, we use it as an auxiliary tool. It watches the Kinesis stream coming from our application in EKS and DynamoDB, and it reacts in very specific ways to these events.
Let’s take, for example, how you can let your team know that a major client started using your application
- The application saves a new client entry on DynamoDB
- DynamoDB pipes this event to a Kinesis stream
- The Kinesis stream triggers the Lambda function that was watching for this kind of event
- The Lambda function sends a friendly Slack message to the marketing channel saying we have a new cool client :)
Another example would be the update of Elasticsearch indexes.
Imagine that you use ES for a text-based search on your entity that is stored in your DynamoDB table.
- A new record is added to your entity table
- DynamoDB pipes this created event to a Kinesis Stream
- A lambda function listens to this stream and saves the information into the Elasticsearch index, making it available to be queried with simple text terms.
As you can see, this is completely decoupled from the main application. It gives immediate value and improves visibility of what is happening.
Elasticsearch
Speaking of which, Elasticsearch is one of the best ways you can analyze your production logs and derive precise information about the health of your application. And, guess what — Amazon provides Elasticsearch as a service and, because it is built on top of EC2, you can scale it to meet your demands.
Unfortunately, this is the only piece so far that is not connected to Kinesis. So for us to populate ES, we use Fluentd in our Kubernetes Nodes. This normalizes the output of the containers and pushes it to ES.
Lambda functions that depend solely on ES data need to be triggered with a cron event, for example, every five minutes.
Besides asserting the health of your system, you can have different indexes (analog to tables in more conventional databases) storing data and usage metrics. In combination with the out-of-the-box UI and the analysis tool Kibana, you can derive important information about how your user uses the product.
Step Functions
But how can we manage stageful execution flows?
Step Functions. This is a powerful tool for triggering executions that are time-based and have an arbitrary mechanism for blocking the operation.
Let’s say that we want to send a message to our users to remind them to finish the tutorial, but the user can start the tutorial on his own without our input. So, we need a way to cancel the execution of this reminder. Step Functions allows exactly that. You control the execution flow, and, if the execution is no longer relevant, you can cancel it.
Elasticache
Step Functions are great, but we need to save the reference of the execution somewhere so the application can decide later if the execution should be canceled or not. Amazon has you covered on this as well.
For anything temporary (but not exclusively temporary), you can use this scalable service that provides a choice between Redis and Memcached.
We love to use Redis, and AWS delivers High Availability instances across as many replicas as you require. There are a couple of major differences between them— Redis’ advanced data structures, and Memcache’s object size limitations (more than one MB).
Choose what fits your use case. You can also combine them!
CloudWatch
If you are using any of the services that we mentioned above, you are generating logs that are pushed and aggregated to CloudWatch. With it, alarms and notifications can be set so your team will not lose a status change in the infrastructure.
IAM (Identity Access Manager)
This is the core of every service that is running on AWS or user accessing any kind of information. IAM defines every rule that the service will run under.
For example, the Lambda function can be triggered by events and perform certain actions. To respond to an event, you don’t need a specific rule, but for performing actions, you need to state that this lambda function can access this service or resource.
The same goes for users. If you define that a particular user group doesn’t have permission for deleting S3 files, they will not be able to perform this action.
When you configure these profiles and security groups, you can define them as broadly as possible. But, when we are talking about production environments, the more granular the permission scope, the better. This is because it provides better control over who is permitted to execute what.
VPC (Virtual Private Cloud)
Still speaking about security, one thing is to make sure a service or a user has the right permissions to do the required job. Another is to reduce the surface of your whole infrastructure, and also reduce the scope that your security response team needs to tackle. Remember, the more of your system is openly connected to the internet, the more vulnerable it is.
This is why AWS came with the implementation of a virtual private cloud, where you can define a granular configuration for your external and internal facing EC2 instances.
ELB (Elastic Load Balancer)
ELB is a standard implementation from AWS to provide a multi-instance scalable application environment. With this tool, you will be able to receive requests to one IP and spread the load so it can be served by multiple instances and scale easily.
ECR (Elastic Container Registry)
If you are going to deploy a modern application, we are talking about container image deployment. Since 2014, the industry has been engulfed in a revolution. New technologies emerged to take advantage of what container technologies can bring.
But for you to deploy containers, you need a storage place. ECR provides exactly that. It’s as simple as pushing a new image. And, inside Kubernetes, you can declare a deployment to use this specific image. Because of IAM, your Kubernetes cluster can pull the image without any additional configuration on your part.
S3 (Simple Storage System)
When you need to deploy or store other artifacts besides container images, Amazon has you covered with S3. Simple to use, just define a bucket with permissions. It can be accessed from anywhere — on the internet or not — and you can upload new files to it.
Not too much to say about it. It is simple, fast, scalable, and costs close to nothing.
Amazon CloudFront
Since we are talking about the distribution of static files with S3, a service that usually is used in conjunction with it is the CDN (content delivery network) solution from AWS. This takes advantage of the global network that AWS built and caches your content near your user base, bringing down the overall download time for artifacts stored on S3. Not to mention the already out-of-the-box DDoS mitigation and other handy features, like cache invalidation, if you need to roll out an update as fast as you can.
AWS Certificate Manager
If you are serving under HTTPS (which you definitely should), you just trust the service that provides and provisions your SSL/TLS certificates. One of the great advantages of using the AWS solution, besides that it follows all security standards and best practices, is the auto-renewal of the certificates. This means it doesn’t need any input on your part if it is Amazon-issued.
To wrap things up, we recommend AWS not only as a building tool, but also as the foundation for a solid, scalable, and modern application.
If you have a startup that values cutting-edge technologies and delivery speed is a must, subscribe to the AWS Activate program to gain access to Amazon Premium support, credits to scale up easily, and to be part of a group of innovative companies.