Vagrant is great… most of the time
Vagrant is a powerful tool for setting up development environments quickly. But just like any other tool, there are use cases where vagrant works well, and use cases where it doesn’t quite fit. I (Jonathan/@64BitsPerMinute) had a conversation with Eric Pugh(@dep4b) and Chris Bradford(@bradfordcp) about how we used Vagrant on our last project involving fraud detection. It went a little like this…
@dep4b: Morning @64BitsPerMinute, @bradfordcp, thanks for joining me on this conversation! We’re going to revolutionize the world of blogging by having a conversation and posting it! It’s like podcasting, only better, it’s textcasting.
So this project was my first real “in anger” experience with Vagrant. I’ve tried it once or twice, and always been disappointed. So close, and yet it didn’t work. But on this project, I think it was critical to our success, and I just wish we hadn’t rejiggered it so many times over the course of the project!
@64BitsPerMinute: @dep4b, well, in real life, requirements can change, so you have to adapt! But what’s going on?
@64BitsPerMinute: When we were still prototyping for the first two weeks, we actually were using local Cassandra nodes! :O
@dep4b: Right! And then @pfreez did something better.
@64BitsPerMinute: At this point in the project, it was just two people working on the prototype, so @pfreez started us off with a simple ubuntu-based vagrant setup.
@dep4b: Was it using DSE, or was it using open source Cassandra?
@64BitsPerMinute: Well, at first we were using vanilla Cassandra. We realized that we would eventually need the search capabilities of solr, and the computational power of spark later on down the road, so we made the switch to DSE for its relative convenience.
@dep4b: So yeah, I remember we had two nodes. And since they were on vagrant, we had them each on a separate local IP address. 192.x.x.30 and 192.x.x.31, I remember cause I kept typing them in when connecting via cqlsh.
So what I really remember was @pfreez saying “oh, it’s just fine” when I ran
vagrant up, and me thinking, gee, it’s been like 30 minutes, and I’m downloading the contents of the internet! Our vagrant image, it started with a standard blank ubuntu box right?
@64BitsPerMinute: Yeah, unfortunately it did. There are a lot of packages and dependencies to install to get a good development environment set up!
@dep4b: There were definitely some annoyances in building the environment from scratch, not just the time to download java and all the dependencies. Remember we saw that DataStax’s public repo either died, or was super slow for a day? That hurt. And how many times did you forget to set your DSE_USER and DSE_PASS credentials as environment variables to log into the DataStax repo? I always remembered about 10 minutes into the process!
@64BitsPerMinute: Yeah, we were seeing 100mb packages each take 20+minutes to download. And setting up passwords just to download stuff was getting annoying.
@dep4b: Guess what I still have in my .bash_profile?
export DSE_USER=epugh_opensourceconnections.comexport DSE_PASS=mypassword
@dep4b: I think my last complaint was that it also sometimes, for no good reason just tanked. On the other hand, reading through the .vagrantfile was very good for me to learn how to install DSE, since every step was there, from the prerequisites to the network configuration aspects.
@64BitsPerMinute: At that point, both OSC and our client had added more developers to the project, so things needed to change. At one point, the client’s devops tried to make an entire ISO for their production servers, and then give it to the devs, but obviously throwing around a 40-60+ gig .iso wasn’t a convenient solution. So eventually we figured out a new way to go, and between our two teams, we created a compromise between these two solutions.
@dep4b: What was it?
@64BitsPerMinute: Well, we made the smallest base box possible that was able to support any type of node.
@dep4b: What’s a base box again? That is the “image”?
@64BitsPerMinute: Yup! So if the image isn’t a raw image, but instead has all the main programs you’ll be using already installed, calling ‘vagrant up’ and ‘vagrant destroy’ only takes a few seconds, once you’ve downloaded the base box. And if you use a compact and slim OS ( we switched to CentOS for this reason), even you initial download can be small! Ours was only 2.5 gigs!
@dep4b: Right, I remember running that import command once
??. Also, we made some other changes right? Instead of two C* nodes, one of which ALSO ran Solr, we had three seperate nodes running, one for C*, one for Solr, and one for Spark?
@dep4b: There were some other nice things that we did as well. Like, instead of having to remember IP addresses, we put those IP addresses into our
/etc/hosts file, and gave them shortcut names.
192.168.33.11 dev-dse.application.vm dev-dse192.168.33.12 dev-spark.application.vm dev-spark192.168.33.13 dev-solr.application.vm dev-solr192.168.33.14 dev-web.application.vm dev-web
@64BitsPerMinute: Yeah, and for the vagrant vms we used the vagrant-hosts plugin for copying over those dns resolutions into each vagrant instance easily.
@dep4b: Then I could do
cqlsh dev-dse and jump right to that server. It felt to me a bit clunky to be modifying my hosts file, but after doing it once it was more natural. It would have been nice though if the “vagrant up” process had done that for me!The other nice thing we got was a working OpsCenter as well. Our initial setup never quite got OpsCenter working, and it was very nice to have that tool as we moved to working with Solr and Spark.
So what didn’t work out perfectly, as I remember some issues still? Sometimes
vagrant up failed. My debugging process was first to do
vagrant destroy and then
vagrant up. If that didn’t work then I typically just rebooted my laptop!
@64BitsPerMinute: I never had to do that on Arch Linux 😛 Though I did have to use the
sudo vboxreload command every time I booted from scratch to make sure the kernel modules were properly loaded.
@dep4b: I still heart my Mac 😉 I also wasn’t good about suspending my vm’s either. I also remember that we started using Talky.io heavily on the project, and the combination of running 3 2 GB VM’s and Talky would put all 4 of my processors at 100% CPU load!
@bradfordcp: Toward the end our Spark box grew to 4GB. That box wasn’t started every day 😀
@64BitsPerMinute: Oh yeah, suspend is pretty critical if you don’t want your vms to end up in weird states.
@dep4b: So @64BitsPerMinute, what else did we learn? I felt like the week you spent with the sys admin guy to validate the Cent OS image was pretty painful. We needed some UI code cranked out, and it was a week of you messing with vagrant images. On the other hand, once it did work, it worked for all of us. And up until that point, I think we might all have had various flavors of DSE and vagrant running…..
@64BitsPerMinute: Yeah, he was working on the scripts that customized each of our 4 nodes based on a single box image. There’s a lot of little configuration steps that you have to remember to add into scripts when you do more complicated multi node setups.
@dep4b: So was having separated out nodes good or bad, from a DSE deploy perspective? I remember that we also had the API tier deployed in a node, but as a developer I preferred to run that locally, so I just never spun that node up.
@64BitsPerMinute: While having multiple nodes does lead to increased disk usage (about 5 gigs per node for me), the ease of being able to select which parts of the system are running is awesome for the developer. For the final deploy, the isolation of instances was great for catching otherwise hard to find bugs.
@dep4b: so in the future, what would you add on that we didn’t get too in this project?
@64BitsPerMinute: Well, our scripts weren’t 100% finished, because vbox reload never quite worked properly. This was fine for my personal workflow, because I stuck to vagrant up and vagrant suspend, and only occasionally using vagrant destroy when I really messed things up. But with more time I would have definitely put some more work into making sure the reload feature was working.
@dep4b: And jonathan, if you had to summarized some lessons learned, what would they be? Would you follow the same path? Go straight to the images being built as VM’s?
@64BitsPerMinute: I would stick to the same general workflow, but alter the timings a bit. Starting with vagrant using stock images with init scripts to install packages means that starting your vagrant can take a long time, but as the project changes rapidly in the prototype stage, you can easily alter what’s installed, and have the team synchronized in an otherwise chaotic environment. Once the dependencies have settled down, it’s definitely a good idea to make a custom base box. The initial download time of the base box can be a pain, but the vastly increased speed of creating and destroying boxes definitely leads to enhanced productivity and less downtime.
@dep4b: yeah, and we were passing the basebox around via a VPN connection as well, a S3 bucket would have made it super speedy. We never had to update the base box at all either, how would we have handled that? @bradfordcp, would vagrant up still taken care of that?
@bradfordcp: Changing the base box is fine, we would update the image referenced in the Vagrantfile after the base box was updated and a new version was deployed. If you’re using Hashicorp’s Atlas system a new version for a box may be pushed. Then it is just a matter of running
vagrant box update on your local machine. See the Vagrant Docs for more info.
Vagrant providers were a little annoying while working on this project. Personally I use both the vmware and virtualbox providers. Not all boxes work with both providers. In this project’s case the base boxes were virtualbox only which required the usage of –provider=virtualbox every time I ran vagrant up. Depending on your setup it may be worth having boxes for multiple providers.
@64BitsPerMinute: Well, this was a great post-mortem for all the ups and downs of vagrant on this project. Thanks for the chat, everyone!