I’ve just tried to use Terraform and Packer to create a Mesos + Ceph cluster in AWS. Yes, I know Mesosphere applications supporting deployment of Mesos cluster in some IaaS (see Getting Started), but I’d like to understand what’s going on there. So, I did it by Terraform and Packer. I’m gonna explain a little bit more about this.
Through this work, I’ve learned a lot of things. I will write something below too.
What is this?
This repository includes Terraform module and Packer template. Terraform module will manage a VPC subnet, igw, sg, etc. and instances below:
admin
ssh gateway
run ceph-deploy
master1, master2, master3
mesos master
marathon
ceph mon
ceph mds
mount cephfs
slaves (default 3)
mesos slave
ceph osd with EBS
mount cephfs
Why not only Mesos, did I manage Ceph cluster? In Mesos, I’d like to use a shared file system because:
Share executer files, etc.
Control cluster software on Mesos
Save uploaded files to Mesos tasks (ex. image files to blog apps)
Or just for fun :)
How to use?
Very simple, make your own terraform config like below:
After spinning up all instances, this resource runs provisioners to initialize Ceph cluster like above. Once created this virtual resource, this procedure won’t be executed any more.
Ceph initializing and provisioning
Considering initializing process above, there are two types of resource creation; Initializing and provisioning.
Initializing means the completely initial timing. At this time, each instance doesn’t need to be provisioned because the cluster initialization process will do everything.
Provisioning, on the other hand, means after the cluster is initialized. For example, when a master/slave/admin instance is terminated, Terraform tries to re-create the instance and this is provisioning. It is a little bit different process than initializing because the cluster already exists.
See terraform/scripts/init_master.sh. if ceph_initialized; block is for this problem. So, all instances can be terminated anytime even after a Ceph cluster is initialized. Try terminate a instance and terraform apply!
SSH gateway and provisioners
Terraform provisioners assume the instance can be connected by ssh directly. In this module, I don’t open ssh port for cluster instances except admin instance, so it is a SSH gateway.
If you configure connection block to the gateway instead of the actual instance, you can use provisioners, but they run on the gateway.
So, I copy AWS key file into the admin instance to be able to login to other instances (there is not a way to use agent forward so far).
For each instance provisioning, I upload script files prefixed by the instance id to prevent overwriting race condition. See terraform/master1.tf. script_path option for connection is undocumented so far, btw.
Master IP addresses
I want to fix master IP addresses because they are hardcoded anywhere. I tried Terraform variable as list of IP addresses, but so far it was impossible.
Because of that, I fixed the number of masters and created each aws_instance resource.
When the list variable feature is implemented, I will refactor these configuration.
Concatenate provisioner scripts
I use bash script for provisioners and there are a lot of common functions, so I wrote a shared script terraform/scripts/header.sh.
To run each provisioner correctly, this file must be concatenated. Also, an entry point main must be concatenated. So I did like this:
12345
provisioner "remote-exec" {
inline = [
"echo main foo bar | cat header.sh init_foo.sh - | bash"
]
}
I think there are much better ways, though…
Isolation between Terraform and Packer
There is a philosophy:
Packer
Install file resources from the Internet
Terraform
Install only runtime information, like IP address
Do not fetch file resources from the Internet
So, any apt-get install are done by Packer nor Terraform. This philosophy is like The Twelve-Factor App.
BTW, awk is great
I wanted a way to use the instance id built by Packer in Terraform automatically. I found that Packer has -machine-readable output option and it is CSV format. So, I started to write a processor script:
Print the machine readable output as well as normal Packer output
Write ami.tf file using the instance id built just now
See packer/process. This is my first awk executable script. awk was very nice in this case and I could find a lot of awk tips.
To write complex shell script for Packer/Terraform, I read the bashstyle article above. I don’t understand all of them, but there are tons of best practice.
Conclusion
It was fun!
Hey, wait a moment… At the beginning, I just wanted to run many applications on Mesos using Docker, for example WordPress, MySQL Galera Cluster, etc. To implement them, I needed a shared file system, so I started to learn about Ceph which I had an eye on before. To deploy the cluster into AWS, I needed some orchestration software, then started to see Terraform, Packer to create AMI… Too long yak shaving it was ;(
Now, let’s start learning about Mesos/Marathon/Docker and many applications!