My Blog: ELB

Showing posts with label ELB. Show all posts

Tuesday, September 17, 2013

Using Varnish Proxy Cache with Amazon Web Service ELB Elastic Load Balancer

Update 19-Feb-2014 ! Elastic Load Balancing Announces Cross-Zone Load Balancing
Maybe this new option makes unnecessary my workaround. Anyone can confirm?

The problem
When putting a Varnish cache in front of an AWS EC2 Elastic Load Balancer weird things happen like: Not getting any traffic to your instance or getting traffic to just one of your instances (in case of Multi Availability Zone (AZ) deployment).

Why?
This has to do with how the ELB is designed and how Varnish is designed. Is not a flaw. Let's call it: Incompatibility.
When you deploy a Elastic Load Balancer into EC2 you access it through a CNAME DNS address. When you deploy an ELB in front of multiple instances in multiple Availability Zones that CNAME is not a DNS address, is many.

Example:

$ dig www.netflix.com

; <<>> DiG 9.8.1-P1 <<>> www.netflix.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64502
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.netflix.com. IN A

;; ANSWER SECTION:
www.netflix.com. 300 IN CNAME dualstack.wwwservice--frontend-san-756424683.us-east-1.elb.amazonaws.com.
dualstack.wwwservice--frontend-san-756424683.us-east-1.elb.amazonaws.com. 60 IN A 184.73.248.59
dualstack.wwwservice--frontend-san-756424683.us-east-1.elb.amazonaws.com. 60 IN A 107.22.235.237
dualstack.wwwservice--frontend-san-756424683.us-east-1.elb.amazonaws.com. 60 IN A 184.73.252.179

As you can see, the answer for this CNAME DNS resolution for Netflix's ELB are 3 different IP addresses. Is up to the application (usually your Internet Web Browser) to decide which to use. Different clients will chose different IPs (they are not always sorted the same way) and this will balance the traffic among different AZs.
The bottom line is that your ELB in real life are multiple instances in multiple AZs and the CNAME mechanism is the method used to balance them.

But Varnish behaves different
And when you specify a CNAME as a Varnish backend server (the destination server where Varnish requests will be send to) it will translate that into only one IP. Despite the amount of IP addresses associated with that CNAME. It will only chose one and use that one for all its activity. Therefore Varnish and AWS ELB are not compatible. (Would you like to suggest a change?)

The Solution
Put a NGINX web server between Varnish and the ELB, acting as a load balancer. I know, not elegant. but works and once is in place no maintenance is needed and the process overhead for the Varnish server is minimum.

Setup
- Varnish server listening on TCP port 80 and configured to send all its requests to 127.0.0.1:8080
- NGINX server listening on TCP port 127.0.0.1:8080 and sending all its requests to our EC2 ELB.

Basic configuration (using AWS EC2 AMI Linux)

yum update
reboot

yum install varnish
yum install nginx

chkconfig varnish on
chkconfig nginx on

Varnish

vim /etc/sysconfig/varnish

Locate the line:
VARNISH_LISTEN_PORT=6081
and change if for
VARNISH_LISTEN_PORT=80

vim /etc/varnish/default.vcl

Locate the backend default configuration and change port from 80 to 8080

backend default {
.host = "127.0.0.1";
.port = "8080";
}

NGINX

vim /etc/nginx/nginx.conf

Get rid of the default configuration file and use this example:

worker_processes 1;

events {
worker_connections 1024;
}

http {
include mime.types;
default_type application/octet-stream;

keepalive_timeout 65;

server_tokens off;

server {
listen localhost:8080;

location / {
### Insert below your ELB DNS Name leaving the semicolon at the end of the line
proxy_pass http://<<<<Insert-here-your-ELB-DNS-Name>>>>;
proxy_set_header Host $http_host;

}
}

}

Restart
service varnish restart
service nginx restart

And voila! Comments and improvement are welcome.

Thanks to
Jordi and Àlex for your help!

Update 19-Feb-2014 ! Elastic Load Balancing Announces Cross-Zone Load Balancing
Maybe this new option makes unnecessary my workaround. Anyone can confirm?

Wednesday, November 7, 2012

AWS EC2 Auto Scaling: External CloudWatch Metric

Our Goal: Create an Auto Scaling EC2 Group in a single Availability Zone and use a Custom CloudWatch metric to scale up (and down) our Web Server cluster behind an ELB.

This exercise will include the Basic Auto Scaling scenario discussed early but now we will add a real Auto Scaling capability using a metric generated inside our application (like Apache Busy Workers). You have a post here about creating custom metrics in CloudWatch. You can easily adapt that configuration to any other custom metric.

What we need for this exercise:

This exercise assumes you have previous experience with EC2 Instances, Security Groups, Custom AMIs and EC2 Load Balancers.

We need:

- An empty ELB.
- A custom AMI.
- A EC2 Keys Pair to use to access our instances.
- A EC2 Security Group.
- Auto Scaling API (If you need help configuring the access to the Auto Scaling API check this post).
- A Apache HTTP server with mod_status module.
- A Script to collect the mod_status value and store it into CloudWatch.
- A custom Test Web Page called "/ping.html".

Preparation:

Is important to be sure that all the ingredients are working as expected. Auto Scaling could be difficult to debug and nasty situations may occur like: A group of instances starting while you are away or a new instance starting and stoping every 20 seconds with bad billing consequences (AWS will charge you a full hour for any started instance, despite it has been only one minute running).
I strongly suggest to manually test your components before create a Auto Scaling configuration.

- Create your Key Pair (In my example "juankeys").

- Deploy an ELB (In my example is named "elb-prueba") in your default AZ ("a"). Configure the ELB to use your custom /ping.html page as Instance Health Monitor. You should see something like this:

- Create a Security Group for your Web Server instances (In my example "wed-servers"). Add to this Security Group the ELB Security Group for Port 80. It should look like the capture below. In this example this SG allows to Ping and TCP access from my home to the Instances AND allows access to port 80 to the connections originated in my Load Balancers (amazon-elb-sg). The Web Server port 80 is not open to Internet, is only open to the ELB.

- Deploy a EC2 Instance using the previous created Key Pair and Security Group. Install a Apache HTTP server and be sure it is configured to start automatically. Create a Test Page called /ping.html at the web sever root folder. This text page can print out ant text you like. Its only mission is to be present. A HTTP 200 is OK and anything else is KO.

- In this exercise we will add to our custom Linux AMI a script and a crontab configuration to create a Custom CloudWatch Metric. We will use what we've learned in this previous post.
Once you have the Apache HTTP server installed and mod_status configured following that previous post instructions, copy this new script version:

#!/bin/bash

logger "Apache Status Started"

export AWS_CREDENTIAL_FILE=/opt/aws/apitools/mon/credential-file-path.template
export AWS_CLOUDWATCH_HOME=/opt/aws/apitools/mon
export AWS_IAM_HOME=/opt/aws/apitools/iam
export AWS_PATH=/opt/aws
export AWS_AUTO_SCALING_HOME=/opt/aws/apitools/as
export AWS_ELB_HOME=/opt/aws/apitools/elb
export AWS_RDS_HOME=/opt/aws/apitools/rds
export EC2_AMITOOL_HOME=/opt/aws/amitools/ec2
export EC2_HOME=/opt/aws/apitools/ec2
export JAVA_HOME=/usr/lib/jvm/jre
export PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/aws/bin:/root/bin

SERVER=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`
ASGROUPNAME="grupo-prueba"
BUSYWORKERS=`wget -q -O - http://localhost/server-status?auto | grep BusyWorkers | awk '{ print $2 }'`

/opt/aws/bin/mon-put-data --metric-name httpd-busyworkers --namespace "AS:$ASGROUPNAME" --unit Count --value $BUSYWORKERS

logger "Apache Status Ended with $SERVER $BUSYWORKERS"

It is similar to the one used before but now we collect just one metric (instead of two) and we store it under a common CloudWatch Name Space. All instances involved in this Auto Scaling exercise will store its Busy Workers values under the same Name Space and Metric Name. In my example the Name Space will be "AS:grupoprueba" and the Metric Name "httpd-busyworkers".

- Create a crontab configuration to execute this script every 5 minutes.

- Create your Custom AMI from the previous created temporal instance. Terminate the previous created temporal instance when finished.

- Deploy a new instance using the recently created AMI (In my example "ami-0e5ee467") to test the Apache server and the script. Check if the HTTP Server starts automatically.

- Manually add the recently created instance under the ELB. Verify that the Load Balancer Check works and it gives you the Status "In Service" for this instance. Verify that the /ping.html page can be accessed from Internet using a browser and the ELB public DNS name ("http://(you-ELB-DNS-name)/ping.html").

- Verify that the script executes every 5 minutes (following the previous instructions) and that CloudWatch is storing the new metric. You could either check that using CloudWatch console or using command line:

# mon-get-stats --metric-name httpd-busyworkers --namespace "AS:grupo-prueba" --statistics Average

2012-11-05 15:15:00 5.0 Count
2012-11-05 15:25:00 5.0 Count
2012-11-05 15:35:00 2.0 Count
2012-11-05 15:40:00 5.0 Count
2012-11-05 15:45:00 5.0 Count

- Once everything is checked, remove the instance from the ELB and Terminate the instance.

Definition:

# as-create-launch-config config-prueba --image-id ami-0e5ee467 --instance-type t1.micro --monitoring-disabled --group web-servers --key juankeys

OK-Created launch config

# as-create-auto-scaling-group grupo-prueba --launch-configuration config-prueba --availability-zones us-east-1a --min-size 0 --max-size 4 --load-balancers elb-prueba --health-check-type ELB --grace-period 120

OK-Created AutoScalingGroup

With as-create-launch-config we define the Instance configuration we will be using in our Auto Scaling Group: Launch config name, AMI ID, Intance Type, Advanced Monitoring (1 minute monitoring) disabled, Security Group and Key Pair to use.

With as-create-auto-scaling-group we define the group itself: Group Name, Launch Confing to use, AZs to deploy in, the minimum number of running instances that our application needs to run, the maximum number of instances we desire to scale up to, ELB name, the Health Check type set to ELB (by default is the EC2 System Status) and the grace period of time grant to a instance before is checked after launch (in seconds).

# as-put-scaling-policy scale-up-prueba --auto-scaling-group grupo-prueba --adjustment=1 --type ChangeInCapacity --cooldown 300

arn:aws:autoscaling:us-east-1:085366056805:scalingPolicy:36101053-f0f3-4c7c-bc4c-60a8a2a943a1:autoScalingGroupName/grupo-prueba:policyName/scale-up-prueba

With as-put-scaling-policy we create a Policy called "scale-up-prueba" for the previous created AS Group. When triggered it will increase the AS in one unit (one instance). No other AS activities for this Group are allowed until 300 seconds passes. After this successful API call a ARN identifier is returned. Save it because we will need it for the Alarm definition.

# mon-put-metric-alarm scale-up-alarm --comparison-operator GreaterThanThreshold --evaluation-periods 1 --metric-name httpd-busyworkers --namespace "AS:grupo-prueba" --period 600 --statistic Average --threshold 10 --alarm-actions arn:aws:autoscaling:us-east-1:085366056805:scalingPolicy:36101053-f0f3-4c7c-bc4c-60a8a2a943a1:autoScalingGroupName/grupo-prueba:policyName/scale-up-prueba

OK-Created Alarm

With mon-put-metric-alarm we create a new CloudWatch alarm called "scale-up-alarm" that will be triggered when the last 10 minutes average of all the values of "httpd-busymetrics" is bigger than 10. Then the scale up policy will be executed through the ARN identifier. In this example, each Apache server with no external load has an average of 5 busyworkers so a good way to test it is to define a threshold of 10 to increase our cluster capacity. In a real world configuration those values will be very different and you have to tune them to mach your application.

# as-put-scaling-policy scale-down-prueba --auto-scaling-group grupo-prueba --adjustment=-1 --type ChangeInCapacity --cooldown 300

arn:aws:autoscaling:us-east-1:085366056805:scalingPolicy:0763114c-f1d3-4f35-a9c5-56c2a7466073:autoScalingGroupName/grupo-prueba:policyName/scale-down-prueba

Now we've created the Policy to be executed when capacity of the AS Group needs to be reduced. And a new ARN identifier is received.

# mon-put-metric-alarm scale-down-alarm --comparison-operator LessThanThreshold --evaluation-periods 1 --metric-name httpd-busyworkers --namespace "AS:grupo-prueba" --period 600 --statistic Average --threshold 9 --alarm-actions arn:aws:autoscaling:us-east-1:085366056805:scalingPolicy:0763114c-f1d3-4f35-a9c5-56c2a7466073:autoScalingGroupName/grupo-prueba:policyName/scale-down-prueba

OK-Created Alarm

The same way we did before with the scale up alarm, we create a new one to trigger the down scale process. The configuration is the same but now the threshold is 9 Apache busy workers after 10 o more minutes.

Note: By default all the API calls are sent to the us-east-1 Region (N.Virginia).

Describe:

# as-describe-launch-configs --headers

LAUNCH-CONFIG NAME IMAGE-ID TYPE
LAUNCH-CONFIG config-prueba ami-0e5ee467 t1.micro

# as-describe-auto-scaling-groups --headers

AUTO-SCALING-GROUP GROUP-NAME LAUNCH-CONFIG AVAILABILITY-ZONES LOAD-BALANCERS MIN-SIZE MAX-SIZE DESIRED-CAPACITY TERMINATION-POLICIES

AUTO-SCALING-GROUP grupo-prueba config-prueba us-east-1a elb-prueba 0 4 0 Default

We use "as-describe-" commands to read the result of our last configuration. Special attention to as-describe-auto-scaling-instances:

# as-describe-auto-scaling-instances --headers

No instances found

This command give us quick look to the running instances within our AS Groups. This is very useful when dealing with AS to find out the amount of instances running and its state. Now the result is "No instances found" and this is correct. Our current configuration says that zero is the minimum healthy instances our application needs to work.

We can describe the recently created alarms with mon-describe-alarms:

# mon-describe-alarms --headers

ALARM STATE ALARM_ACTIONS NAMESPACE METRIC_NAME PERIOD STATISTIC EVAL_PERIODS COMPARISON THRESHOLD
scale-down-alarm ALARM arn:aws:autoscalin...6056805:AutoScaling AS:grupo-prueba httpd-busyworkers 600 Average 1 LessThanThreshold 9.0
scale-up-alarm OK arn:aws:sns:us-eas...ame/scale-up-prueba AS:grupo-prueba httpd-busyworkers 600 Average 1 GreaterThanThreshold 10.0

Or using the CloudWatch Console:

Under normal circumstances, the "scale-down-alarm" will have the state "Alarm" and this is normal.
Using CloudWatch Console you can add to this alarms and action to send an Email notification to obtain better visibility during the test.

Bring it to Production:

Now the cluster is idle, no instances running. So now we will tell to AS that our application requires a minimum of 1 healthy instance to run:

# as-update-auto-scaling-group grupo-prueba --min-size 1

OK-Updated AutoScalingGroup

# as-describe-auto-scaling-groups --headers

AUTO-SCALING-GROUP GROUP-NAME LAUNCH-CONFIG AVAILABILITY-ZONES LOAD-BALANCERS MIN-SIZE MAX-SIZE DESIRED-CAPACITY TERMINATION-POLICIES
AUTO-SCALING-GROUP grupo-prueba config-prueba us-east-1a elb-prueba 1 4 1 Default
INSTANCE INSTANCE-ID AVAILABILITY-ZONE STATE STATUS LAUNCH-CONFIG
INSTANCE i-9d022be1 us-east-1a Pending Healthy config-prueba

Notice that now Minimum is 1 in the AS configuration and now there is a new instance under our AS Group ("i-9d022be1" in this example). This instance has been automatically deployed by AS to match the desired number of healthy instances for our application. Notice the "Pending" status that means that it is still in the initialization process. We can follow this process with as-describe-auto-scaling-instances:

# as-describe-auto-scaling-instances
INSTANCE i-9d022be1 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-9d022be1 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-9d022be1 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-9d022be1 grupo-prueba us-east-1a InService HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-9d022be1 grupo-prueba us-east-1a InService HEALTHY config-prueba

Now the recently launched instance is in service. That means that its Health Check (ELB ping.html test page) verifies OK. If you open the AWS Console and read the current ELB "Instances Tab", the new instance ID should be there, automatically added to the Load Balancer and your application up and running.

Common problem scenarios:
- If you observe that the new instances are constantly Launched and Terminated by AS this probably means that /ping.html page fails. Stop the experiment with "as-update-auto-scaling-group grupo-prueba --min-size 0" and verify your components.
- If your web server and test page verify OK but the AS is still Deploying and Terminating the instances without a chance to rise to the Healthy status then you should increase the value of "--grace-period" in the AS Group definition to give more time to your AMI to start a initialize its services.
- If the instances start but they fail to automatically be added to the ELB then probably the Instances are deployed in a incorrect Availability Zone. Either correct your AS Launch Configuration or expand the ELB to the rest of AZs in your Region.

Force to Scale UP:

To test the AS Policy we can lie to CloudWatch and tell it that we have much more load than we really have. We will inject a false amount of Busy Workers to the CW Metric:

# mon-put-data --metric-name httpd-busyworkers --namespace "AS:grupo-prueba" --unit Count --value 20

# mon-put-data --metric-name httpd-busyworkers --namespace "AS:grupo-prueba" --unit Count --value 20

# mon-get-stats --metric-name httpd-busyworkers --namespace "AS:grupo-prueba" --statistics Average
2012-11-05 15:35:00 2.0 Count
2012-11-05 15:40:00 5.0 Count
2012-11-05 15:45:00 5.0 Count
2012-11-05 15:50:00 5.0 Count
2012-11-05 15:55:00 5.0 Count
2012-11-05 16:00:00 2.0 Count
2012-11-05 16:15:00 5.0 Count
2012-11-05 16:20:00 5.0 Count
2012-11-05 16:21:00 20.0 Count
2012-11-05 16:23:00 20.0 Count

And after a while, the average Busy Workers value rises and this triggers the scale up Alarm and then its AS Policy:

# as-describe-scaling-activities --headers --show-long view

ACTIVITY,135c95fa-8d67-4664-85e4-5d78dfb73353,2012-11-05T16:25:13Z,grupo-prueba,Successful,(nil),"At 2012-11-05T16:24:14Z a monitor alarm scale-up-alarm in state ALARM triggered policy scale-up-prueba changing the desired capacity from 1 to 2. At 2012-11-05T16:24:27Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 1 to 2.",100,Launching a new EC2 instance: i-ebeac397,(nil),2012-11-05T16:24:27.687Z

And a second instance automatically is launched:

# as-describe-auto-scaling-instances

INSTANCE i-9d022be1 grupo-prueba us-east-1a InService HEALTHY config-prueba
INSTANCE i-ebeac397 grupo-prueba us-east-1a InService HEALTHY config-prueba

If we keep feeding CloudWatch with fake values and we keep the average high, soon a third instance will be launched:

# as-describe-scaling-activities --headers --show-long view

ACTIVITY,ACTIVITY-ID,END-TIME,GROUP-NAME,CODE,MESSAGE,CAUSE,PROGRESS,DESCRIPTION,UPDATE-TIME,START-TIME
ACTIVITY,ef187965-9a79-463f-8a2d-b6f413cc9226,2012-11-05T16:31:11Z,grupo-prueba,Successful,(nil),"At 2012-11-05T16:30:14Z a monitor alarm scale-up-alarm in state ALARM triggered policy scale-up-prueba changing the desired capacity from 2 to 3. At 2012-11-05T16:30:30Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 2 to 3.",100,Launching a new EC2 instance: i-99e4cde5,(nil),2012-11-05T16:30:30.795Z

# as-describe-auto-scaling-instances

INSTANCE i-99e4cde5 grupo-prueba us-east-1a InService HEALTHY config-prueba
INSTANCE i-9d022be1 grupo-prueba us-east-1a InService HEALTHY config-prueba
INSTANCE i-ebeac397 grupo-prueba us-east-1a InService HEALTHY config-prueba

If we leave it alone for a while, the average will decrease and the automatically launched instances will be terminated with a 10 minutes interval:

# as-describe-auto-scaling-instances

INSTANCE i-99e4cde5 grupo-prueba us-east-1a InService HEALTHY config-prueba
INSTANCE i-9d022be1 grupo-prueba us-east-1a Terminating HEALTHY config-prueba
INSTANCE i-ebeac397 grupo-prueba us-east-1a InService HEALTHY config-prueba

# as-describe-scaling-activities --headers --show-long view

ACTIVITY,ACTIVITY-ID,END-TIME,GROUP-NAME,CODE,MESSAGE,CAUSE,PROGRESS,DESCRIPTION,UPDATE-TIME,START-TIME
ACTIVITY,7095a10e-d7b7-4e68-a1c9-cb350e8b0d45,2012-11-05T16:45:03Z,grupo-prueba,Successful,(nil),"At 2012-11-05T16:43:48Z a monitor alarm scale-down-alarm in state ALARM triggered policy scale-down-prueba changing the desired capacity from 3 to 2. At 2012-11-05T16:44:04Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 3 to 2. At 2012-11-05T16:44:04Z instance i-9d022be1 was selected for termination.",100,Terminating EC2 instance: i-9d022be1,(nil),2012-11-05T16:44:04.106Z

# as-describe-auto-scaling-instances

INSTANCE i-99e4cde5 grupo-prueba us-east-1a InService HEALTHY config-prueba
INSTANCE i-ebeac397 grupo-prueba us-east-1a InService HEALTHY config-prueba

# as-describe-scaling-activities --headers --show-long view

ACTIVITY,ACTIVITY-ID,END-TIME,GROUP-NAME,CODE,MESSAGE,CAUSE,PROGRESS,DESCRIPTION,UPDATE-TIME,START-TIME
ACTIVITY,31e8673e-7255-410e-b8a7-51ee677f2bb8,(nil),grupo-prueba,InProgress,(nil),"At 2012-11-05T16:50:23Z a monitor alarm scale-down-alarm in state ALARM triggered policy scale-down-prueba changing the desired capacity from 2 to 1. At 2012-11-05T16:50:35Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 2 to 1. At 2012-11-05T16:50:35Z instance i-ebeac397 was selected for termination.",50,Terminating EC2 instance: i-ebeac397,(nil),2012-11-05T16:50:35.538Z

# as-describe-auto-scaling-instances

INSTANCE i-99e4cde5 grupo-prueba us-east-1a InService HEALTHY config-prueba

We have learned something here: An instance in an AS environment is volatile. It could disappear at any time because it is Terminated and with the instance its EBS volumes. You have to take that into account when designing your infrastructure. If your web server needs to store some information that you could need later you should save it elsewhere: Cloudwatch, external log server, data base, etc.

Also notice that the survived instance is the i-99e4cde5. This is the last one that was deployed. And the first one to be terminated during the shrinking process was the first member of the group. Auto Scaling uses that logic to help you to get more value for your money. EC2 bills you the full hour, so leaving alive the last launched instance gives you a chance to use what you've already payed for.

Average of what?

The Policy used in this example is not a perfect method and this Average Metric is a bit confusing. First we have to know that the Average CPU used in the official documentation for Auto Scaling is a native CloudWatch metric. It is automatically created when you define your AS Group. EC2 takes the CPU usage of all Instances in your AS Group and store there the Average value (It does the same with other EC2 metrics: CW Console -> All Metrics pull-down menu -> "EC2: Aggregated by Auto Scaling Group"). An elegant method could be do the same kind of aggregation but with our custom metric, but I don't know how to do that. So, what we have is a single metric name receiving all those different values from our cluster members. Then is important that all those members send that information in a timely fashion to not distort the average calculation. I think that a "crontab */5 * * * *" is a good solution but I'm quite open to other suggestions.

The ELB role:

By default the Load Balancer will send an equal amount of connection to the web cluster members and therefore the amount of Apache Busy Workers will remain "balanced" among the cluster. The configuration described here is not useful when using "sticky sessions". If a web server increases its connections above the other cluster members, could trigger an unnecessary scale-up action.

Cleaning:

You don't want an AS Group doing things while you sleep so I suggest you to delete all your AS configurations after your test is done.

# as-update-auto-scaling-group grupo-prueba --min-size 0

OK-Updated AutoScalingGroup

# as-update-auto-scaling-group grupo-prueba --desired-capacity 0

OK-Updated AutoScalingGroup

# as-delete-auto-scaling-group grupo-prueba

Are you sure you want to delete this AutoScalingGroup? [Ny]y

OK-Deleted AutoScalingGroup

# as-delete-launch-config config-prueba

Are you sure you want to delete this launch configuration? [Ny]y

OK-Deleted launch configuration

# as-describe-auto-scaling-instances

No instances found

Saturday, November 3, 2012

AWS EC2 Auto Scaling: Basic Configuration

Our goal: Create an Auto Scaling EC2 Group in a single Availability Zone and use a HTTP status page as a Health Monitor for our Load Balancer and the Auto Scaling group instances.

This exercise will show us some Auto Scaling basics and will be useful to understand the concepts beneath but the Auto Scaling Group will not automatically "scale" responding to external influence like Average CPU Usage or Total Apache Connections (This aspect is covered in this post: AWS EC2 Auto Scaling: External CloudWatch Metric). With the Auto Scaling configuration described here, we will obtain a web server cluster that can be increased and decreased in members with a simple Auto Scaling API call and we will transfer the monitoring role to the ELB to automatically replace failed EC2 instances or web servers.

What we need for the exercise:
This exercise assumes you have previous experience with EC2 Instances, Security Groups, Custom AMIs and EC2 Load Balancers.

- An empty ELB.
- A custom AMI with HTTP server installed.
- A custom Test Web Page called "ping.html".
- A EC2 Keys Pair to use to access our instances.
- A EC2 Security Group.
- Auto Scaling API. If you need help configuring the access to the Auto Scaling API check this post.

Preparation:
Is important to be sure that all the ingredients are working as expected. Auto Scaling could be difficult to debug and nasty situations may occur like: A group of instances starting while you are away or a new instance starting and stoping every 20 seconds with bad billing consequences (AWS will charge you a full hour for any started instance, despite it has been only one minute running).
I strongly suggest to manually test your setup before create a Auto Scaling configuration.

- Create your Key Pair (In my example "juankeys").

- Deploy an ELB (In my example is named "elb-prueba") in your default AZ ("a"). Configure the ELB to use your custom /ping.html page as Instance Health Monitor. You should see something like this:

- Deploy a EC2 Instance using the previous created Key Pair and Security Group. Install a HTTP server and be sure it is configured to start automatically. Create a Test Page called /ping.html at the web sever root folder. This text page can print out ant text you like. Its only mission is to be present. A HTTP 200 is OK and anything else is KO.

- Create your Custom AMI from the previous created temporal instance. Terminate the previous created temporal instance when finished.

- Deploy a new instance using the recently created AMI (In my example "ami-1ceb5075") to test it. Check if the HTTP Server starts automatically.

- Manually add the recently created instance under the ELB. Verify that the Load Balancer Check works and it gives you the Status "In Service" for this instance. Verify that the /ping.html page can be accessed from Internet using a browser and the ELB public DNS name ("http://(you-ELB-DNS-name)/ping.html").

- Once everything checks OK, remove the instance from the ELB and Terminate the instance.

Definition:

# as-create-launch-config config-prueba --image-id ami-1ceb5075 --instance-type t1.micro --monitoring-disabled --group web-servers --key juankeys

OK-Created launch config

# as-describe-launch-configs --headers

LAUNCH-CONFIG NAME IMAGE-ID TYPE
LAUNCH-CONFIG config-prueba ami-1ceb5075 t1.micro

# as-describe-auto-scaling-groups --headers

AUTO-SCALING-GROUP GROUP-NAME LAUNCH-CONFIG AVAILABILITY-ZONES LOAD-BALANCERS MIN-SIZE MAX-SIZE DESIRED-CAPACITY TERMINATION-POLICIES

AUTO-SCALING-GROUP grupo-prueba config-prueba us-east-1a elb-prueba 0 4 0 Default

We use "as-describe-" commands to read the result of our last configuration. Special attention to as-describe-auto-scaling-instances:

# as-describe-auto-scaling-instances --headers

No instances found

This command give us quick look to the running instances within our AS Groups. This is very useful when dealing with AS to find out the amount of instances running and its state. Now the result is "No instances found" and this is correct. Our current configuration says that zero is the minimum healthy instances our application needs to work and therefore, zero is the result.

Bring it to Production:

Let's say to AS that minimum is now 1 and describe the configuration:

Notice that now Minimum is 1 in the AS configuration and now there is a new instance under our AS Group ("i-5bb9e427" in this example). This instance has been automatically deployed by AS to match the desired number of healthy instances for our application. Notice the "Pending" status that means that it is still in the initialization process. We can follow this process with as-describe-auto-scaling-instances:

# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a InService HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a InService HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a InService HEALTHY config-prueba

Now the recently deployed instance is in service. That means that its Health Check (ELB ping.html test page) verifies OK. If you open the AWS Console and read the current ELB "Instances" Tab, the new instance ID should be there automatically added to the Load Balancer and your application up and running.

Common problem scenarios:
- If you observe that the new instances are constantly Deployed and Terminated by AS this probably means that ping.html page fails. Stop the experiment with "as-update-auto-scaling-group grupo-prueba --min-size 0" and verify your components.
- If your web server and test page is verified OK but the AS is still Deploying and Terminating the instances without a chance to rise to the Healthy status then you should increase the value of "--grace-period" in the AS Group definition to give more time to your AMI to start a initialize its services.
- If the instances start but they fail to automatically be added to the ELB then probably the Instances are deployed in a incorrect Availability Zone. Either correct your AS Launch Configuration or expand the ELB to the rest of AZs in your Region.

Sabotage:

Log-in as root to the recently deployed AS Instance and force it to fail with this command "mv /var/www/html/ping.html /var/www/html/ping.html.KO". You can see at the /var/log/httpd/access_log file that the ELB is looking for the test page and it is failing:

- 10.29.36.216 - - [03/Nov/2012:12:23:45 +0000] "GET /ping.html HTTP/1.1" 200 49 "-" "ELB-HealthChecker/1.0"
- 10.29.36.216 - - [03/Nov/2012:12:23:51 +0000] "GET /ping.html HTTP/1.1" 200 49 "-" "ELB-HealthChecker/1.0"
- 10.29.36.216 - - [03/Nov/2012:12:23:57 +0000] "GET /ping.html HTTP/1.1" 404 286 "-" "ELB-HealthChecker/1.0"
- 10.29.36.216 - - [03/Nov/2012:12:24:03 +0000] "GET /ping.html HTTP/1.1" 404 286 "-" "ELB-HealthChecker/1.0"
- 10.29.36.216 - - [03/Nov/2012:12:24:09 +0000] "GET /ping.html HTTP/1.1" 404 286 "-" "ELB-HealthChecker/1.0"
- 10.29.36.216 - - [03/Nov/2012:12:24:15 +0000] "GET /ping.html HTTP/1.1" 404 286 "-" "ELB-HealthChecker/1.0"

Let's see what happens soon after.

# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a InService HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a InService HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a InService HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a InService UNHEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a InService UNHEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Terminating UNHEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Terminating UNHEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Terminating UNHEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Terminating UNHEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-3dce9341 grupo-prueba us-east-1a Pending HEALTHY config-prueba
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Terminating UNHEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-3dce9341 grupo-prueba us-east-1a Pending HEALTHY config-prueba
INSTANCE i-5bb9e427 grupo-prueba us-east-1a Terminating UNHEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-3dce9341 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-3dce9341 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-3dce9341 grupo-prueba us-east-1a Pending HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-3dce9341 grupo-prueba us-east-1a InService HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-3dce9341 grupo-prueba us-east-1a InService HEALTHY config-prueba
# as-describe-auto-scaling-instances
INSTANCE i-3dce9341 grupo-prueba us-east-1a InService HEALTHY config-prueba

After a while, our initial instance (i-5bb9e427) is declared UNHEALTHY and put to Termination due the Test Page fails several times. At the same time a new instances is deployed (i-3dce9341), tested and aggregated to the ELB to match our "minimum=1" criteria. Auto Scaling (together with ELB) monitors our cluster and any failed instance will be removed and a new one will be launched.

We have learned something here: An instance in an AS environment is volatile. It could disappear at any time because it is Terminated and with the instance its EBS volumes. You have to take that into account when designing your infrastructure. If your web server needs to store some information that you could need later you should save it elsewhere: Cloudwatch, external log server, data base, etc.

Maneuvers:

Changing the minimum number of instances in the AS configuration is a way to change the amount of running instances but there are others.

- We can force the number of running instances by changing the "--desired-capacity" in the AS Group definition:

as-update-auto-scaling-group grupo-prueba --desired-capacity X

- You can scale by Schedule: AWS Scaling by Schedule documentation.

- And you can scale by Policy. This aspect is covered in this post: AWS EC2 Auto Scaling: External CloudWatch Metric.

Cleaning:

You don't want an AS Group doing things while you sleep so I suggest you to delete all your AS configurations after your test is done.

# as-update-auto-scaling-group grupo-prueba --min-size 0

OK-Updated AutoScalingGroup

# as-update-auto-scaling-group grupo-prueba --desired-capacity 0

OK-Updated AutoScalingGroup

# as-delete-auto-scaling-group grupo-prueba

Are you sure you want to delete this AutoScalingGroup? [Ny]y

OK-Deleted AutoScalingGroup

# as-delete-launch-config config-prueba

Are you sure you want to delete this launch configuration? [Ny]y

OK-Deleted launch configuration

# as-describe-auto-scaling-instances

No instances found

Friday, June 1, 2012

IPv6 Hello World!

After a little set up for surfing with IPv6 is time for a "IPv6 Hello World!". Ingredients: AWS EC2 instance, EC2 ELB and a Apache HTTP server.

First, deploy one EC2 instance. I always use the default Amazon Linux 64bits AMI. I'm used to RedHat and CentOS Linux and this AMI is basically the same. Then install your favourite web server flavour. This instance will have an IPv4 address and that's all we need. The magic for IPv6 is at the ELB public side. There's no way (and and no need now) to get an IPv6 attached to your instance.

Once that is done, deploy an ELB and attach the instance to it. Notice on the ELB "Description" tab that you have 3 DNS records for it.

In may case:

domenech-1821931935.us-east-1.elb.amazonaws.com (A Record)

ipv6.domenech-1821931935.us-east-1.elb.amazonaws.com (AAAA Record)

dualstack.domenech-1821931935.us-east-1.elb.amazonaws.com (A or AAAA Record) 

Let's give a detailed look to it. The first DNS record (A Record) is the typical IPv4 record where you usually point the CNAME to.

# host domenech-1821931935.us-east-1.elb.amazonaws.com

domenech-1821931935.us-east-1.elb.amazonaws.com has address 23.21.124.217

root@juan-ubuntu:~# host ipv6.domenech-1821931935.us-east-1.elb.amazonaws.com
ipv6.domenech-1821931935.us-east-1.elb.amazonaws.com has IPv6 address 2406:da00:ff00::1715:7cd9

So, if we resolve the A Record we get a IPv4 (23.21.124.217 in my example) and with the AAAA Record we get the IPv6 (2406:da00:ff00::1715:7cd9). They are there waiting for us to use them. No more configuration needed.

Searching this IP in this BGP AS database we get that it belongs to the Autonomous System AS16509 prefix 2406:da00::/32 from Amazon.com. In other words, part of the AWS IPv6 infrastructure. Those 32 bits prefix mean that are 96 bits of IP addresses available (IPv6=128bits) into that prefix and that is 79,228,162,510,000,000,000,000,000,000 IPs. Nice!

Another interesting thing is that the AAAA Record "implies" the A Record. An IPv6 is formed by 8 "hexquads" 16 bit long each one separated by colons and written in lower case hexadecimal. Double colon (::) means "full of zeros". In my example, the IPv6 2406:da00:ff00::1715:7cd9 translates to 2406:da00:ff00:0000:0000:0000:1715:7cd9. If we take the last 8 hexadecimal elements grouped by 2 and convert to decimal:

17 = 23

15 = 21

7c = 124

d9 = 217

And this is 23.22.124.217. The IPv4 address that this ELB also provides.

Now we have just to create our CNAME record for our domain pointing to the AWS ELB. We can either choose the AAAA Record or the "dualstack" (A and AAAA) Record. Basically the Dual Stack record answers a IPv4 IP if our DNS call asks for a A Record or a AAAA Record in that case.

Dig for A Record:

# dig ipv6.domenech.org A @2001:4860:4860::8888
; <<>> DiG 9.8.1-P1 <<>> ipv6.domenech.org A @2001:4860:4860::8888
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56239
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;ipv6.domenech.org. IN A
;; ANSWER SECTION:
ipv6.domenech.org. 60 IN CNAME dualstack.domenech-1821931935.us-east-1.elb.amazonaws.com.
dualstack.domenech-1821931935.us-east-1.elb.amazonaws.com. 60 IN A 23.21.124.217
;; Query time: 216 msec
;; SERVER: 2001:4860:4860::8888#53(2001:4860:4860::8888)
;; WHEN: Tue Jun 5 11:58:20 2012
;; MSG SIZE rcvd: 122

Dig for AAAA Record:

# dig ipv6.domenech.org AAAA @2001:4860:4860::8888

; <<>> DiG 9.8.1-P1 <<>> ipv6.domenech.org AAAA @2001:4860:4860::8888

;; global options: +cmd

;; Got answer:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56671

;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:

;ipv6.domenech.org.        IN    AAAA

;; ANSWER SECTION:

ipv6.domenech.org.    57    IN    CNAME    dualstack.domenech-1821931935.us-east-1.elb.amazonaws.com.

dualstack.domenech-1821931935.us-east-1.elb.amazonaws.com. 60 IN AAAA 2406:da00:ff00::1715:7cd9

;; Query time: 123 msec

;; SERVER: 2001:4860:4860::8888#53(2001:

4860:4860::8888)
;; WHEN: Tue Jun 5 11:58:23 2012
;; MSG SIZE rcvd: 134

Note: 2001:4860:4860::8888 is a Google IPv6 DNS Server.

This duality is something we have to keep in mind when testing IPv6. We have to be certain whether our browser will ask for a IPv6 Record or not.
And basically that's it. With the EC2 instance up, the web site up and our CNAME ready in our DNS server (I used http://ipv6.domenech.org) you just need to open a browser and type the URL.

Ta-raaaaa!

Appendix.
IP Source: Do not expect to read IPv6 in your Apache log files. All the communication between the ELB and EC2 is IPv4. By default all your connections to your instance will come from the ELB internal IP (something like 10.28.x.x) and this is what you will get at the logs. To reflect your clients IP in your log files instead the ELB IP you need to change the default Apache configuration adding %{X-Forwarded-For}i to your LogFormat. And to make present this information at your application you need to read the HTTP_X_FORWARDED_FOR header provided by the ELB. The best way to start dealing with headers is to create a PHP test page and read all the headers that come with every request. Don't forget to delete this page when is no longer needed to avoid giving away too much information.