Saturday, November 3, 2012

AWS EC2 Auto Scaling: Basic Configuration

aws-ec2-auto-scaling-basic-configuration-diagram

Our goal: Create an Auto Scaling EC2 Group in a single Availability Zone and use a HTTP status page as a Health Monitor for our Load Balancer and the Auto Scaling group instances.

This exercise will show us some Auto Scaling basics and will be useful to understand the concepts beneath but the Auto Scaling Group will not automatically "scale" responding to external influence like Average CPU Usage or Total Apache Connections (This aspect is covered in this post: AWS EC2 Auto Scaling: External CloudWatch Metric). With the Auto Scaling configuration described here, we will obtain a web server cluster that can be increased and decreased in members with a simple Auto Scaling API call and we will transfer the monitoring role to the ELB to automatically replace failed EC2 instances or web servers.


What we need for the exercise:
This exercise assumes you have previous experience with EC2 Instances, Security Groups, Custom AMIs and EC2 Load Balancers.

- An empty ELB.
- A custom AMI with HTTP server installed.
- A custom Test Web Page called "ping.html".
- A EC2 Keys Pair to use to access our instances.
- A EC2 Security Group.
- Auto Scaling API. If you need help configuring the access to the Auto Scaling API check this post.


Preparation:
Is important to be sure that all the ingredients are working as expected. Auto Scaling could be difficult to debug and nasty situations may occur like: A group of instances starting while you are away or a new instance starting and stoping every 20 seconds with bad billing consequences (AWS will charge you a full hour for any started instance, despite it has been only one minute running).
I strongly suggest to manually test your setup before create a Auto Scaling configuration.

- Create your Key Pair (In my example "juankeys").

- Deploy an ELB (In my example is named "elb-prueba") in your default AZ ("a"). Configure the ELB to use your custom /ping.html page as Instance Health Monitor. You should see something like this:


- Create a Security Group for your Web Server instances (In my example "wed-servers"). Add to this Security Group the ELB Security Group for Port 80. It should look like the capture below. In this example this SG allows to Ping and TCP access from my home to the Instances AND allows access to port 80 to the connections originated in my Load Balancers (amazon-elb-sg). The Web Server port 80 is not open to Internet, is only open to the ELB.



- Deploy a EC2 Instance using the previous created Key Pair and Security Group. Install a HTTP server and be sure it is configured to start automatically. Create a Test Page called /ping.html at the web sever root folder. This text page can print out ant text you like. Its only mission is to be present. A HTTP 200 is OK and anything else is KO.

- Create your Custom AMI from the previous created temporal instance. Terminate the previous created temporal instance when finished.

- Deploy a new instance using the recently created AMI  (In my example "ami-1ceb5075") to test it. Check if the HTTP Server starts automatically.

- Manually add the recently created instance under the ELB. Verify that the Load Balancer Check works and it gives you the Status "In Service" for this instance. Verify that the /ping.html page can be accessed from Internet using a browser and the ELB public DNS name ("http://(you-ELB-DNS-name)/ping.html").

- Once everything checks OK, remove the instance from the ELB and Terminate the instance.


Definition:

# as-create-launch-config config-prueba --image-id ami-1ceb5075 --instance-type t1.micro --monitoring-disabled --group web-servers --key juankeys

OK-Created launch config

# as-create-auto-scaling-group grupo-prueba --launch-configuration config-prueba --availability-zones us-east-1a --min-size 0 --max-size 4 --load-balancers elb-prueba --health-check-type ELB --grace-period 120

OK-Created AutoScalingGroup

With as-create-launch-config we define the Instance configuration we will be using in our Auto Scaling Group: Launch config name, AMI ID, Intance Type, Advanced Monitoring (1 minute monitoring) disabled, Security Group and Key Pair to use.

With as-create-auto-scaling-group we define the group itself: Group Name, Launch Confing to use, AZs to deploy in, the minimum number of running instances that our application needs to run,  the maximum number of instances we desire to scale up to, ELB name, the Health Check type set to ELB (by default is the EC2 System Status) and the grace period of time grant to a instance before is checked after launch (in seconds).

Note: By default all the API calls are sent to the us-east-1 Region (N.Virginia).


Describe:

# as-describe-launch-configs --headers

LAUNCH-CONFIG  NAME           IMAGE-ID      TYPE
LAUNCH-CONFIG  config-prueba  ami-1ceb5075  t1.micro  

# as-describe-auto-scaling-groups --headers


AUTO-SCALING-GROUP  GROUP-NAME    LAUNCH-CONFIG  AVAILABILITY-ZONES  LOAD-BALANCERS  MIN-SIZE  MAX-SIZE  DESIRED-CAPACITY  TERMINATION-POLICIES
AUTO-SCALING-GROUP  grupo-prueba  config-prueba  us-east-1a          elb-prueba      0         4         0                 Default 

We use "as-describe-" commands to read the result of our last configuration. Special attention to as-describe-auto-scaling-instances:

# as-describe-auto-scaling-instances --headers   

No instances found

This command give us quick look to the running instances within our AS Groups. This is very useful when dealing with AS to find out the amount of instances running and its state. Now the result is "No instances found" and this is correct. Our current configuration says that zero is the minimum healthy instances our application needs to work and therefore, zero is the result.

Bring it to Production:

Let's say to AS that minimum is now 1 and describe the configuration:

# as-update-auto-scaling-group grupo-prueba --min-size 1

OK-Updated AutoScalingGroup

#  as-describe-auto-scaling-groups --headers

AUTO-SCALING-GROUP  GROUP-NAME    LAUNCH-CONFIG  AVAILABILITY-ZONES  LOAD-BALANCERS  MIN-SIZE  MAX-SIZE  DESIRED-CAPACITY  TERMINATION-POLICIES
AUTO-SCALING-GROUP  grupo-prueba  config-prueba  us-east-1a          elb-prueba      1         4         1                 Default          
INSTANCE  INSTANCE-ID  AVAILABILITY-ZONE  STATE    STATUS   LAUNCH-CONFIG
INSTANCE  i-5bb9e427   us-east-1a         Pending  Healthy  config-prueba

Notice that now Minimum is 1 in the AS configuration and now there is a new instance under our AS Group ("i-5bb9e427" in this example). This instance has been automatically deployed by AS to match the desired number of healthy instances for our application. Notice the "Pending" status that means that it is still in the initialization process. We can follow this process with as-describe-auto-scaling-instances:

#  as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Pending  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Pending  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Pending  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Pending  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  InService  HEALTHY  config-prueba 
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  InService  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  InService  HEALTHY  config-prueba

Now the recently deployed instance is in service. That means that its Health Check (ELB ping.html test page) verifies OK. If you open the AWS Console and read the current ELB "Instances" Tab, the new instance ID should be there automatically added to the Load Balancer and your application up and running.

Common problem scenarios:
- If you observe that the new instances are constantly Deployed and Terminated by AS this probably means that ping.html page fails. Stop the experiment with "as-update-auto-scaling-group grupo-prueba --min-size 0" and verify your components.
- If your web server and test page is verified OK but the AS is still Deploying and Terminating the instances without a chance to rise to the Healthy status then you should increase the value of "--grace-period" in the AS Group definition to give more time to your AMI to start a initialize its services.
- If the instances start but they fail to automatically be added to the ELB then probably the Instances are deployed in a incorrect Availability Zone. Either correct your AS Launch Configuration or expand the ELB to the rest of AZs in your Region.


Sabotage:

Log-in as root to the recently deployed AS Instance and force it to fail with this command "mv /var/www/html/ping.html /var/www/html/ping.html.KO". You can see at the /var/log/httpd/access_log file that the ELB is looking for the test page and it is failing:

- 10.29.36.216 - - [03/Nov/2012:12:23:45 +0000] "GET /ping.html HTTP/1.1" 200 49 "-" "ELB-HealthChecker/1.0"
- 10.29.36.216 - - [03/Nov/2012:12:23:51 +0000] "GET /ping.html HTTP/1.1" 200 49 "-" "ELB-HealthChecker/1.0"
- 10.29.36.216 - - [03/Nov/2012:12:23:57 +0000] "GET /ping.html HTTP/1.1" 404 286 "-" "ELB-HealthChecker/1.0" 
- 10.29.36.216 - - [03/Nov/2012:12:24:03 +0000] "GET /ping.html HTTP/1.1" 404 286 "-" "ELB-HealthChecker/1.0"
- 10.29.36.216 - - [03/Nov/2012:12:24:09 +0000] "GET /ping.html HTTP/1.1" 404 286 "-" "ELB-HealthChecker/1.0"
- 10.29.36.216 - - [03/Nov/2012:12:24:15 +0000] "GET /ping.html HTTP/1.1" 404 286 "-" "ELB-HealthChecker/1.0"

Let's see what happens soon after.

# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  InService  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  InService  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  InService  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  InService  UNHEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  InService  UNHEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Terminating  UNHEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Terminating  UNHEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Terminating  UNHEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Terminating  UNHEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-3dce9341  grupo-prueba  us-east-1a  Pending      HEALTHY    config-prueba
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Terminating  UNHEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-3dce9341  grupo-prueba  us-east-1a  Pending      HEALTHY    config-prueba
INSTANCE  i-5bb9e427  grupo-prueba  us-east-1a  Terminating  UNHEALTHY  config-prueba  
# as-describe-auto-scaling-instances
INSTANCE  i-3dce9341  grupo-prueba  us-east-1a  Pending  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-3dce9341  grupo-prueba  us-east-1a  Pending  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-3dce9341  grupo-prueba  us-east-1a  Pending  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-3dce9341  grupo-prueba  us-east-1a  InService  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-3dce9341  grupo-prueba  us-east-1a  InService  HEALTHY  config-prueba
# as-describe-auto-scaling-instances
INSTANCE  i-3dce9341  grupo-prueba  us-east-1a  InService  HEALTHY  config-prueba

After a while, our initial instance (i-5bb9e427) is declared UNHEALTHY and put to Termination due the Test Page fails several times. At the same time a new instances is deployed (i-3dce9341), tested and aggregated to the ELB to match our "minimum=1" criteria. Auto Scaling (together with ELB) monitors our cluster and any failed instance will be removed and a new one will be launched.

We have learned something here: An instance in an AS environment is volatile. It could disappear at any time because it is Terminated and with the instance its EBS volumes. You have to take that into account when designing your infrastructure. If your web server needs to store some information that you could need later you should save it elsewhere: Cloudwatch, external log server, data base, etc.


Maneuvers:

Changing the minimum number of instances in the AS configuration is a way to change the amount of running instances but there are others.

- We can force the number of running instances by changing the "--desired-capacity" in the AS Group definition:

as-update-auto-scaling-group grupo-prueba --desired-capacity X 

- You can scale by Schedule: AWS Scaling by Schedule documentation.

- And you can scale by Policy. This aspect is covered in this post: AWS EC2 Auto Scaling: External CloudWatch Metric.


Cleaning:

You don't want an AS Group doing things while you sleep so I suggest you to delete all your AS configurations after your test is done.

# as-update-auto-scaling-group grupo-prueba --min-size 0

OK-Updated AutoScalingGroup

# as-update-auto-scaling-group grupo-prueba --desired-capacity 0

OK-Updated AutoScalingGroup

# as-delete-auto-scaling-group grupo-prueba


    Are you sure you want to delete this AutoScalingGroup? [Ny]y

OK-Deleted AutoScalingGroup


# as-delete-launch-config config-prueba


    Are you sure you want to delete this launch configuration? [Ny]y  

OK-Deleted launch configuration


# as-describe-auto-scaling-instances 

No instances found