The Serverless Journey at NorthOne
While many companies today are experimenting with serverless architecture in various forms, NorthOne has the distinction of being (almost!) entirely serverless. From a developer perspective, that’s a pretty special thing. Serverless architecture is the near-total abstraction of the server layer from the application layer. There are a lot of fun problems to solve in traditional server architecture, but every “good” problem is inevitably countered by ten “bad” ones, typically solved in obscurity with a lot of effort and low gain. For NorthOne to embrace serverless development as we have, the biggest gains are in the things we don’t think about, the services we don’t have to manage, and the layer of complexity that is already solved.
It’s easy to imagine developers writing lines of code to solve a problem, but in a more traditional server-based stack, a lot of work goes into the foundational architecture on which that code runs. For even a single server, you have to answer a staggering number of questions, like: What performance-based hardware requirements do you have? What kind of operating system fits best? How do keep changes or request volume from slowing down (or taking down!) the server? With server-based stacks, everything also needs to be provisioned for worst-case scenarios, ultimately costing time and money for unused processing (but hey, better safe than sorry, right?).
Serverless, on the other hand, takes away all that complexity, and leaves you with a baseline of performance and reliability so that your code is just your code! What kind of hardware? What kind of operating system? Don’t know, don’t care. AWS Serverless takes care of all that for us, enabling us to focus on the code that actually makes a difference for our business. Much like building with LEGO pieces, our architectural development becomes a highly composable yet systematic process: we write code to handle the input, execution, and output for a given request. Scale is still considered, but the vast AWS ecosystem allows us to leverage easily configurable services that are most appropriate for our use-case.
There are a lot of great services to solve interesting problems in the AWS Serverless catalog, and far too many to cover in just one post, so I’m going to focus on just three of my favorites: SQS, Lambda, and DynamoDB.
Our Favorite Pieces
SQS, or Simple Queue Service, is the original AWS service, and one of the core services for scalability. It’s a queue, yes, but from a development perspective, it’s an ideal store for transient data. If you think of data flowing through our systems as water flowing downwards, SQS is like a massive funnel. We can collect any amount of messages from any number of sources, and we run code to “drain” messages at the bottom of that funnel. As the volume of messages going into the queue dips or spikes, we define the rate we pull those messages to drain that funnel, providing message output at a consistent rate for variable input. We open, close, or limit the tap with the click of a button!
SQS also facilitates a number of healthy practices for failure scenarios. The dream is that all the code we write works the first time all the time, but in practice, that’s never quite true. Any code can fail for a number of reasons, including:
- The code has a bug and doesn’t work in certain cases (logic error)
- The code connects to an external service that does not respond to every request (rate limited)
- The code connects to something externally that is no longer available (service outage).
SQS defines a difference between when a message is read from the queue and when it is pulled from the queue. We write code to read it, and if successful, the message is pulled automatically. If that code fails for any reason, no problem! The message is still in the queue, just invisible to other consumers for a period of time.
That covers us from intermittent failures, but what about more permanent failures in the case of a code error or service outage? SQS has patterns for a secondary queue referred to as a “Dead-Letter Queue” (DLQ). We hope for the best but prepare for the worst. Any message that is read but not removed after a certain number of times is automatically drained into the DLQ, keeping our main queue healthy and fluid. We are able to monitor the DLQ for even a single message. At that time, we can either: have other code handle the error, assuming we know what it might be, or we can let the messages pool until a human can intervene and identify the problem. Is the problem caused by a service outage? No problem, we can wait until it’s back on, and then drain the DLQ back into the main queue! Is the problem caused by a code error? We just need to deploy a fix, then do the exact same thing, and nothing is lost in the process.
We recently performed a major back-end service migration for onboarding new customers in order to facilitate future development. This is, understandably, a pretty stressful scenario to go through, especially for a core service. We needed to make sure that the onboarding service was down for as little time as possible (is there a way we can migrate without taking down the service at all?), that the data we needed was fully migrated and transformed as necessary from the old service to new, and that we could easily identify if anything went wrong. Luckily, our onboarding service already had an SQS queue set up to consolidate data early in the process. When we turned off the original back-end service, messages simply sat in the queue while we performed the data migration. When ready, we fed that queue into the new service.
The data migration also required us to validate our onboarding records and, if eligible, attempt to copy each record to the new service. If it failed for any reason, we wanted to know about it, but we also didn’t want it to block any other messages. So, we added every record into an SQS queue where they could be processed individually. If a record couldn’t be migrated for any reason we didn’t anticipate, they were safely retried a number of times before being collected in a DLQ, where the dev team could identify what went wrong for that record and how to fix it after the migration. No sweat!
It’s nice to have pre-built solutions to our problems, but we also need to be able to run our own code. For this, AWS Lambda is the go-to code solution. Application code can be unruly, and even the best-intentioned code-base can be hard to manage over time. Lambda functions, on the other hand, encourage us to write small bursts of code to solve a single problem when they run. Remember when we needed to migrate customer records from one service to another in the previous example? With lambda, we were able to write one function that is specifically tailored to that problem for a single record. When we need to run that code, it doesn’t matter whether we trigger that code to run once or one thousand times in parallel. A single function that takes too long, or fails, has no impact on the other functions being executed because they don’t share any resources underneath the hood!
As a database, DynamoDB is another incredible serverless service from AWS which we rely on heavily to achieve scalability. To provide a bit of context, it’s important to understand that historically, data storage has been expensive. It’s easy to see the price reduction over time when we talk about the average hard-drive from the 90s compared to today, where we can purchase an exponentially bigger data storage for a fraction of the price. It might surprise you to learn that the most prolific database patterns today originated from that same time, and continue to use patterns that optimize for storage efficiency rather than performance.
DynamoDB on the other hand, takes on a radically different approach to storing data in order to optimize for performance rather than storage.
A good analogy for this is like living in a house with roommates. Imagine it’s a small place, and it has a single pantry. You and your roommates are fairly tidy people, but you all have pots, pans, ingredients, cleaning products, etc. that you need to store. All your pots are different shapes and sizes, but you can place them inside each other like Russian nesting dolls. You do something similar with your perishables and your kitchen products in order to get the most out of the space. This is an organized way to store things, and perfect if you and your roommates never need to add or remove things, but real life isn’t like that. You and your roommates all need to get your pans when you cook, so you have to move all the pans to access yours. When you put it back, you do the same. And what about when your roommate needs a lot of different things and he’s already in the pantry? You’ll need to wait for him to finish before you can get your own things.
Now, let’s say you and your roommates move to a big new house, containing multiple pantry rooms. You could have a pantry for every type of thing! a pantry for pots and pans, a pantry for cleaning products, a pantry for food, etc. that’s much more organized, but chances are you and your roommates will use some of those pantries more than others; you’ll still be in each others way every time you need to get cereal for breakfast!
What if there were enough pantries for each roommate to have their own pantry? Would you still want to share space, or would you rather be able to walk into a pantry knowing everything is yours? If you need your favourite pan, you simply walk in and grab it, and you put it in the same place when you’re done. Your roommates keep their things in their pantries, you keep your things in yours, and it doesn’t matter who needs what, and it doesn’t matter how long anyone takes in their own pantry to get what they need; you aren’t bumping shoulders every time you need similar things. Most services we build take advantage of this kind of pattern, which allows our data volume and cost to grow linearly, while our performance remains static (a rare thing in databases).
Consider an example of data for a workout-tracking app. A conventional database pattern might include the following tables (or pantries from our analogy above):
- User Data
- User-Routine (including timestamp and record of completion)
That’s an organized approach and very storage efficient, but this also means a lot of users will be reading, writing, and updating the same user-routine table; ultimately slowing down the process. Furthermore, there may be variance in routines; Existing routines can change, and new routines can be added with new definitions for completion, ultimately modifying the definition of both in the user-routine table.
An expert examining the scenario above will tell you that there are a lot of sophisticated patterns that can be applied to mitigate the performance hit, but why bother? A DynamoDB approach to this would be to consider that the way this data is accessed is by user and date/time. So what if the data was stored by user first and date/time second? The user can have their own section of the database to store their full routine for a given date/time.
We stop worrying about the volume of created or updated records because no two users are ever in the same space. The routine definition can be included in each record at the time of completion, so we never have to worry about how the routine definition changing over time impacts historical records. For better performance at scale and a simpler implementation, everyone wins.
Building Blocks for the Future
One of the greatest benefits of a serverless stack doesn’t come from a single service, but is instead from the ability to adapt and evolve the patterns we use. Every new product we build is an opportunity to make our technology better, faster, and more resilient (and we’re always building new products!).
As we at NorthOne continue to grow and expand our expertise and platform, we expect our serverless capabilities to grow with us, allowing us to stay ahead of the curve on our data initiatives.
If you are interested in joining NorthOne, or just want to learn more about what it is like to be an engineer on our team, connect with us here.