I have been doing web development for a long time. In the early days that meant editing HTML without CSS, just font tags and animated gifs in Notepad, uploaded via FTP to my ISP provided "web space".
Web pages were basic and very simple to construct. I learned HTML through experimentation and "view source" along with millions of others.
There were few tools to choose from, no web server software to worry about, and it was mostly a game of making sure your page rendered correctly in each browser as new ones were released.
In the 28 years since there has been a web development explosion . There are a million tools, too many frameworks, a thousand different ways to approach and solve every problem.
I have tried a small fraction of the available options, yet have the battle scars of what worked and did not work during those years. That means for every new project I try and start, I am crippled by that experience.
I do not have the unbridled optimism or naive outlook of a new developer who has never tried anything before. The young upstart at school or college trying downloading node for the first time, getting their first Python function working, or wondering why there is so many things to learn before you can even write a line of code.
Instead my brain sees nothing but problems. The function that will not scale properly. The server configuration that works now, but which offers no high availability. The dependency tree which is not pinning any versions, so is just one npm attack away from oblivion.
Decisions are easier when you know nothing.
There is an adage in software development:
- First you make it Work
- Then you make it Good
- Then you make if Fast
Of course it is not always that straightforward. It may not be good, if it is not fast. And it may only be possible to make it fast at the expense of making it good.
But that does not mean it is not a useful way to describe the process to someone who is not experienced in the suffering of the average software developer.
What is often missing in the explanation to the non-experienced manager is that the time between each of those steps may be double the previous.
Imagine a form where a user is going to enter a delivery address.
A basic working version may have all the necessary fields, a name, a couple of generic address fields, perhaps a city, county, postcode and country. It will save to the database on submission.
But making it good and making it fast require significantly more effort. Addresses are complex. Maybe some form of postcode search or global address autocomplete would be better, but that's a big database to manage yourself. Validation sounds easy, but not every country has postcodes. And even searching for your country in a list is a pain if you have to scroll through a lot of them just to find the United Kingdom. Perhaps the most common countries that use your website should go at the top.
There's lots of little decisions that make it good. And some skilled technical know-how to make it fast.
When you want me to estimate something, is it just to make it work? Or do you want it to be good? And if you want it to be fast, then how long have you got?
It may take a week to make the feature work. Two weeks to make it good. And four weeks to make it fast.
The quality of a software product is often determined by which of those the manager is willing to bear.
And the quality of a software developer is often determined by how much they care about making it good.
In the early days (or years) of a SaaS product you are fighting for product market fit. Even when you know without a doubt that your product serves a purpose that people are willing to pay for, there's a long road between Minimum Viable Product (MVP) and a mature piece of software.
If it all it takes to onboard a new customer is filling in a couple of fields on a form and choosing some configuration options - whether done by the customer or one of your customer success team - that's Product Delivery.
But if you're doing bespoke development, adding features just for them, building new reports, or even doing on-site training - that's Project Delivery.
They may seem similar at first. After all, either way, you have to deliver something to the customer. This may be compounded if some of your customers want a product and some want a project.
But while treating product delivery like a project may result in some additional paperwork, but otherwise no harm done, trying to perform project delivery with a product delivery toolset will result in a very disappointed customer.
Projects need plans. They need tooling that can show you when things are going wrong. They need discovery, goals, risks and a clear list of responsibilities.
Products just need billed. Which is why everyone just wants to deliver products.
Like every software engineer, I spend a lot of time looking online for solutions to problems (usually via Google, but more recently Kagi and Claude). I am eternally grateful to other engineers who have taken the time to write about their solutions. Especially those who cover the why as well as the how.
In that spirit, I'm going to breakdown the process I've been using for the past few years to ship code into production for the SaaS products I'm involved with. They're all Python and hosted in AWS, but that's not required for this approach.
Local Development
I spent many years dealing with a development server and database in the cloud, where the whole team used svn (with a fantastic homegrown web front-end that preceded Github by many years) to create branches and do their work. It generally worked fine, until somebody broke the database or the development server collapsed.
I'm now a huge fan of Docker-based local development. Whatever language and database you're working with, if you can get everyone running your application locally through Docker, it's going to be a positive experience. A simple compose file to get everything launched, VS Code for editing, and no central resource for everyone to worry about.
I'm increasingly a fan of making our software 100% local, back to that "famous" 5-minute Wordpress install I talked about yesterday. That means a developer should be able to get the whole thing running locally via Docker, and the whole application should work without any AWS access. That may sound simple, but it means abstracting certain features and sometimes not using AWS services. For example, if you upload to S3 in production you'll need to abstract that in your code so that somebody running locally can still upload and retrieve files from local disk.
They should also be able to run it without an internet connection. If your team want to work in the middle of nowhere, they should be able to. That means vending all your JS and CSS libraries, rather than loading them from public CDNs.
You'll be surprised how you naturally get better practices from following that approach.
Git Process
I want the interaction with Git to be as minimal as possible (one of the reasons for not using GitFlow) and am a happy user of the Github Desktop application. I want to minimise merge conflicts and I want to minimise the cognitive load for the team.
With that in mind, our process is simple. The development team branch off main
, do their work in a feature branch, and then merge back to main
again.
We always consider the main
branch to be deployable. Now what does deployable mean?
- We don't merge half-complete implementations. And if we do, they're hidden behind feature flags/configuration values.
- We have automated tests (both unit tests and integration tests via Playwright) to make sure that nothing is broken.
- Every job goes through a peer-review process before being merged (via Github).
- There's no cherry-picking. Once it's in
main
, that's it going to be deployed. You may occasionally roll something back again if we're not quite ready for that, but you can't decide to only move some of main
forward into production.
Once a job is merged to main
then we run a build process that creates a Docker image. This happens via a combination of CodePipeline and CodeBuild, and the resulting image is uploaded to ECR. But you could easily do this with Github Actions and any private Docker registry.
The image gets tagged with the commit-id from Github and the newest image is always tagged with latest
.
We have separate development and production AWS accounts (and you should too), but the important thing is that the resulting image from the build process is shared to both environments. ECR lets you create those cross-account sharing rules.
However, the image is only deployed automatically to the development environment because we have stakeholders.
Stakeholders
What/Who are stakeholders?
These are the people that care whether or not your software is any good. They may be project managers. They may be product owners. They may be a QA department. They may be customers, depending on how you sell your software.
Despite all your tests, peer-reviews and developer best-efforts - stakeholders want to see your hard work before it's pushed out to the whole world. Just because the code passes your tests doesn't mean it's good.
They're also not technical, so they can't pull the code down themselves and build their own Docker containers locally. Or maybe they could, but I'm assuming the thought of having to support that gives you the fear.
So we have a development environment they can access to see the latest version of the application. They can clearly see what's going to be deployed next.
And because the Docker image is shared between the development and production environments, it's exactly the same code that's going to be deployed to both. There's no chance of them seeing something in dev, and then something else getting deployed to production. Your QA team will like that.
Deployment
We use CodeDeploy with Fargate on ECS. It's blue/green, so we deploy a complete set of new containers and make sure they're OK before updating the load balancer to point at the new IP addresses and destroying the old ones. CodeDeploy manages all of this for us, which is fantastic.
However... this would work just as well with Github Actions SSHing into a box in Digital Ocean. It's just a Docker image you need to put on your server.
Production Deployment
Assuming everyone is happy and it's time to do a deploy, we add a release
tag on to the image within ECR in production. This automatically starts another CodePipeline/CodeDeploy combo that deploys that image with the same blue/green process.
We try and deploy once or twice a week, but that does rely a lot on the stakeholders.
Rollback
Easy, just tag a previous image with the release
tag to deploy that one instead.
Database Migrations
We use Alembic (with Flask), and we run the Alembic upgrade command within CodeBuild in the CodePipeline before CodeDeploy. If this step fails, we stop the pipeline and don't deploy the image and then a DBA needs to take a look at the database to see what happened. Since this is all within CodeBuild, it's easy to look at the logs in Cloudwatch and see why the command errored.
This also means that we consider our database migrations to be a pre-deploy step, not a post-deploy step. The important thing to know about that is that database migrations have to be backwards compatible, as your database is going to change before the code does. If the migrations work, but the deploy fails, you still want the application to continue running even though you added additional columns.
The alternative would be a post-deploy step, but then your new code goes live and it needs to be able to cope with the columns it relies on not being there yet. And that's harder to work with (tried that!).
Pre-deploy also means that you should do migrations that drop columns separately. Consider migrations to be additive, and then worry about cleanup migrations to remove deprecated fields later.
Pre-deploy also makes it easier for developers to test locally, as they can run the migrations and then switch back to the main
branch and run tests to confirm everything is still fine.
And backwards compatible migrations is what makes rollbacks easier.
Half-completed migrations can be a pain to resolve, which is a good reason to use PostgreSQL instead of MySQL, since PostgreSQL supports transactional DDL.
Hotfixes
Sometimes you need to fix a bug in production but you don't want to do a full deploy, because a full deploy will also drag along everything else that's been deployed to main
since your last production deploy and your stakeholders aren't ready for that yet.
To resolve this we have a CodeBuild configuration that builds an image whenever a branch with "hotfix" in its name has some code merged into it. It goes something like this:
- Follow the usual process for creating a branch off
main
and fix your bug. - Merge it back into
main
. - Create a new branch off
main
called hotfix-something
. - Create a branch off
hotfix-something
called my-urgent-fix
. - Cherry pick the commit from
main
containing your fix and put it into your my-urgent-fix
branch. - Do a PR to pull from
my-urgent-fix
into hotfix-something
.
When that PR is merged, Codebuild sees the name of the branch and creates a new image using the usual process and tags it as normal with the commit id, but doesn't deploy it to development (because the development environment already contains the fix, as the fix is already in main
). And then we just tag that as release
and deploy it to production, then delete the hotfix branch.
Alternative Flows
There are plenty other approaches, so to round this out I'll give a few reasons why we don't use them.
- Github Flow. If you don't have stakeholders, this is as simple as it gets. But if you have any team at all, I think it's useful for people to see something in a preview environment before it goes live. I'm also not ready to trust that tests are 100% reliable.
- GitFlow. It's so complicated! There's so much merging going on between steps, the whole thing just creates a spaghetti mess of commit history, followed by the very high likelihood that you'll screw something up somewhere. Worst of all, what gets deployed to production is not the same as was deployed to development, because there's a merging step between branches done along the way.
- OneFlow. This one is pretty good for the stakeholder lifestyle, because it has release branches that you could deploy for them to see before moving them forward. It is a bit more complicated than what we're doing, and while release previews might seem straightforward, they're incredibly difficult to create a database for.
Conclusion
It works for us! It may not work for you, but it's been a really positive experience for our team.
Pull previews are the next step, to automatically spin up a preview environment (using the local docker compose file that developers use) on EC2 when a Github PR is created. I'll write more about that when the time comes.