High Availability Architecture on AWS

A comprehensive guide to building a fault-tolerant, auto-scaling web application architecture on AWS using Terraform.

aws

terraform

architecture

devops

Project Overview

Designing for failure is a core principle of cloud computing. This project demonstrates how to build a highly available (HA) and fault-tolerant web application architecture on AWS using Infrastructure as Code (IaC) with Terraform.

The goal is to create an infrastructure that can withstand the failure of a single Availability Zone (AZ) without service interruption and automatically scale based on traffic demand.

Architecture

The architecture consists of the following components distributed across two Availability Zones:

VPC: A custom Virtual Private Cloud with public and private subnets.
Application Load Balancer (ALB): Distributes incoming traffic to EC2 instances.
Auto Scaling Group (ASG): Automatically adjusts the number of EC2 instances based on CPU utilization.
EC2 Instances: Web servers running Nginx, placed in private subnets for security.
NAT Gateways: Allow private instances to access the internet for updates without being exposed.
Security Groups: Strict firewall rules to control traffic flow.

Infrastructure as Code (Terraform)

Below are the key Terraform configurations used to provision this infrastructure.

1. VPC and Networking

We start by defining the VPC and subnets. We use the terraform-aws-modules/vpc/aws module for best practices.

module "vpc" {
  source = "terraform-aws-modules/vpc/aws"

  name = "ha-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b"]
  private_subnets = ["10.0.1.0/24", "10.0.2.0/24"]
  public_subnets  = ["10.0.101.0/24", "10.0.102.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = false # For high availability, we need one per AZ
  enable_vpn_gateway = false

  tags = {
    Terraform = "true"
    Environment = "prod"
  }
}

2. Security Groups

Security is paramount. We allow HTTP traffic only from the Load Balancer to the web servers.

resource "aws_security_group" "alb_sg" {
  name        = "alb-sg"
  description = "Allow HTTP inbound traffic"
  vpc_id      = module.vpc.vpc_id

  ingress {
    description = "HTTP from Internet"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "web_sg" {
  name        = "web-server-sg"
  description = "Allow HTTP from ALB"
  vpc_id      = module.vpc.vpc_id

  ingress {
    description     = "HTTP from ALB"
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    security_groups = [aws_security_group.alb_sg.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

3. Application Load Balancer

The ALB serves as the entry point for our application.

resource "aws_lb" "main" {
  name               = "ha-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb_sg.id]
  subnets            = module.vpc.public_subnets
}

resource "aws_lb_listener" "front_end" {
  load_balancer_arn = aws_lb.main.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.web.arn
  }
}

resource "aws_lb_target_group" "web" {
  name     = "web-target-group"
  port     = 80
  protocol = "HTTP"
  vpc_id   = module.vpc.vpc_id
  
  health_check {
    path = "/"
    healthy_threshold = 2
  }
}

4. Auto Scaling Group

Finally, we configure the Auto Scaling Group to manage our EC2 instances.

resource "aws_launch_template" "web" {
  name_prefix   = "web-server-"
  image_id      = "ami-0c55b159cbfafe1f0" # Amazon Linux 2
  instance_type = "t3.micro"
  
  network_interfaces {
    associate_public_ip_address = false
    security_groups             = [aws_security_group.web_sg.id]
  }

  user_data = base64encode(<<-EOF
              #!/bin/bash
              yum update -y
              yum install -y httpd
              systemctl start httpd
              systemctl enable httpd
              echo "<h1>Hello from $(hostname -f)</h1>" > /var/www/html/index.html
              EOF
  )
}

resource "aws_autoscaling_group" "web" {
  desired_capacity    = 2
  max_size            = 4
  min_size            = 2
  vpc_zone_identifier = module.vpc.private_subnets
  target_group_arns   = [aws_lb_target_group.web.arn]

  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }
}

Deployment

To deploy this infrastructure:

Initialize Terraform: terraform init
Review the plan: terraform plan
Apply the changes: terraform apply

Conclusion

This architecture provides a robust foundation for production workloads. By leveraging Terraform, we ensure that the infrastructure is reproducible, version-controlled, and easy to manage. The use of Auto Scaling and Multi-AZ deployment ensures that the application remains available even during high traffic or infrastructure failures.