Amazon Web Service Recipes
These are some basic recipes for working with the Amazon Web Services APIs. I'm mostly interested in S3, EC2, and SQS.
Working with S3
Simple Storage Service (S3) is a key/value based online storage system. As my first puppet extension please forgive any obvious mistakes. A function on the puppet server generates pre-signed URLs so that the S3 authentication credentials do not have to be shared with the puppet clients. The pre-signed URL has an "Expires" field that will prevent the URL from being used past the specified time. The expires parameter in the manifest must be specified such that the URL will still be fresh by the time the client gets around to using it.
Why would you use S3 to store files rather than a Puppet filebucket? If you have a far-flung empire of machines you are administrating S3 might be faster than a singular puppet server. Load balancing of large file downloads will also be enhanced.
This is the puppet definition for downloading a file from S3 to the puppet client:
define s3get ($bucket=$DEFAULTS3BUCKET, $cwd="/tmp", $expires=30) {
exec { "s3getcurl[$bucket][$title][$name]":
cwd => $cwd,
creates => "$cwd/$name",
command => s3getcurl($bucket, $title, $name, $expires),
}
}
Use it like this:
s3get {"$prefix/$fireholrpm":
notify => Package[firehol],
}
This downloads an RPM and by default sticks it in "/tmp/$prefix/$fireholrpm". The bucket must be specified by $DEFAULTS3BUCKET and the key must be named "$prefix/$fireholrpm". By default the expires parameter is set to 30 seconds. One can also specify the parameters explicitly:
s3get { "$prefixconf$fireholconfdir/$fireholconf":
cwd => "$fireholconfdir",
name => "$fireholconf",
expires => "90",
bucket => "otherbucket",
}
This pulls from "otherbucket" the key "$prefixconf$fireholconfdir/$fireholconf" and stick the value as a file in $fireholconfdir named $fireholconf. Note that in this example the bucket, key name, destination directory, and filename can all be different.
The URL Generating Function
The function s3getcurl($bucket, $title, $name, $expires) is run on the puppet server to generate the signed URLs and a formatted curl command line. This is an example output:
curl -s -f 'https://s3.amazonaws.com/testbucket/testkey?Signature=G8CKO6OcaNE0JdyxL00oInKrJ5Q%3D&Expires=1187964511&AWSAccessKeyId=0EYWE832126XK7AB4582'
The puppet client can then execute this command to perform the operation. If the URL is modified or expired it is useless.
This is the s3getcurl() function:
require 'puppet/parser/functions/s3sign.rb'
module Puppet::Parser::Functions
newfunction(:s3getcurl, :type => :rvalue) do |args|
bucket = args[0]
key = args[1]
filename = args[2]
expires = args[3] # in seconds from Time.now.to_i
headers = { }
s3 = S3Sign.new(ENV['AWS_ACCESS_KEY_ID'], ENV['AWS_SECRET_ACCESS_KEY'] )
url = s3.get(bucket, key, expires, headers)
heads = headers.map{|k,v| "-H '#{k}: #{v}'"}.join(' ')
cmd = "curl #{heads} --create-dirs -s -f -o #{filename} 'https://s3.amazonaws.com#{url}'"
return cmd
end
end
This is the supporting s3sign.rb file. This file is only slightly modified from Thorsten von Eicken's blog post on the subject.
require 'digest/sha1'
require 'openssl'
require 'cgi'
require 'base64'
## The S3Sign class generates signed URLs for Amazon S3
class S3Sign
def initialize(aws_access_key_id, aws_secret_access_key)
@aws_access_key_id = aws_access_key_id
@aws_secret_access_key = aws_secret_access_key
end
# builds the canonical string for signing.
def canonical_string(method, path, headers={}, expires=nil)
interesting_headers = {}
headers.each do |key, value|
lk = key.downcase
if lk == 'content-md5' or lk == 'content-type' or lk == 'date' or lk =~ /^x-amz-/
interesting_headers[lk] = value.to_s.strip
end
end
# these fields get empty strings if they don't exist.
interesting_headers['content-type'] ||= ''
interesting_headers['content-md5'] ||= ''
# just in case someone used this. it's not necessary in this lib.
interesting_headers['date'] = '' if interesting_headers.has_key? 'x-amz-date'
# if you're using expires for query string auth, then it trumps date (and x-amz-date)
interesting_headers['date'] = expires if not expires.nil?
buf = "#{method}\n"
interesting_headers.sort { |a, b| a[0] <=> b[0] }.each do |key, value|
buf << ( key =~ /^x-amz-/ ? "#{key}:#{value}\n" : "#{value}\n" )
end
# ignore everything after the question mark...
buf << path.gsub(/\?.*$/, '')
# ...unless there is an acl or torrent parameter
if path =~ /[&?]acl($|&|=)/ then buf << '?acl'
elsif path =~ /[&?]torrent($|&|=)/ then buf << '?torrent'
end
return buf
end
def hmac_sha1_digest(key, str)
#STDERR.puts "SIGN: #{str}"
OpenSSL::HMAC.digest(OpenSSL::Digest::SHA1.new, key, str)
end
# encodes the given string with the aws_secret_access_key, by taking the
# hmac-sha1 sum, and then base64 encoding it. then url-encodes for query string use
def encode(str)
CGI::escape(Base64.encode64(hmac_sha1_digest(@aws_secret_access_key, str)).strip)
end
# generate a url to put a file onto S3
def put(bucket, key, expires_in=0, headers={})
return generate_url('PUT', "/#{bucket}/#{CGI::escape key}", expires_in, headers)
end
# generate a url to put a file onto S3
def get(bucket, key, expires_in=0, headers={})
return generate_url('GET', "/#{bucket}/#{CGI::escape key}", expires_in, headers)
end
# generate a url with the appropriate query string authentication parameters set.
def generate_url(method, path, expires_in, headers)
#log "path is #{path}"
expires = expires_in.nil? ? 0 : Time.now.to_i + expires_in.to_i
canonical_string = canonical_string(method, path, headers, expires)
encoded_canonical = encode(canonical_string)
arg_sep = path.index('?') ? '&' : '?'
return path + arg_sep + "Signature=#{encoded_canonical}&" +
"Expires=#{expires}&AWSAccessKeyId=#{@aws_access_key_id}"
end
end
Enhancements To Do
I'm a rookie at puppet and ruby... so I'm all ears for any suggestions.
* It might be nice not to have the client be dependent on curl. This would require having the client puppet instance be extended with S3 functionality.
* Maybe S3 should be a new kind of file bucket that could be used in the same way?
* I'd like to implement a s3putcurl function as well.
* In an ideal world the s3get definition would check not only for the existence of the file but the md5sum as well. If the md5sum changes on S3 then the file should be re-downloaded over the old one.