• Archives

  • Categories:

OpenStack infra-cloud sprint

Last week at the HPE offices in Fort Collins, members of the OpenStack Infrastructure team focused on getting an infra-cloud into production met from Monday through Thursday.

The infra-cloud is an important project for our team, so important that it has a Mission!

The infra-cloud’s mission is to turn donated raw hardware resources into expanded capacity for the OpenStack infrastructure nodepool.

This means that in addition companies who Contribute Cloud Test Resources in the form of OpenStack instances, we’ll be running our own OpenStack-driven cloud that will provide additional instances to our pool of servers we run tests on. We’re using the OpenStack Puppet Modules (since the rest of our infra uses Puppet) and bifrost, which is a series of Ansible playbooks that use Ironic to automate the task of deploying a base image onto a set of known hardware.

Our target for infra-cloud was a few racks of HPE hardware provided to the team by HPE that resides in a couple HPE data centers. When the idea for a sprint came together, we thought it might be nice to have the sprint itself hosted at an HPE site where we could meet some of the folks who handle servers. That’s how we ended up in Fort Collins, at an HPE office that had hosted several mid-cycle and sprint events for OpenStack in the past.

Our event kicked off with an overview by Colleen Murphy of work that’s been done to date. The infra-cloud team that Colleen is part of has been architecting and deploying the infra-cloud over the past several months with an eye toward formalizing the process and landing it in our git repositories. Part of the aim of this sprint was to get everyone on the broader OpenStack Infrastructure team up to speed with how everything works so that the infra cores could intelligently review and provide feedback on the patches being deployed. Colleen’s slides (available here) also gave us an overview of the baremetal workflow with bifrost, the characteristics of the controller and compute nodes, networking (and differences found between the US East and US West regions) and her strategy for deploying locally for a development environment (GitHub repo here). She also spent time getting us up to speed with the HPE iLO management interfaces that we’ll have to use if we’re having trouble with provisioning.

This introduction took up our morning. After lunch it was time to talk about our plan for the rest of our time together. We discussed the version of OpenStack we wanted to focus on and broadly how and if we planned to do upgrades, along with goals of this project. Of great importance was also that we built something that could be redeployed if we changed something, we don’t want this infrastructure to bit rot and cause a major hassle if we need to rebuild the cloud for some reason. We then went through the architecture section of the infra-cloud documentation to confirm that the assumptions there continued to be accurate, and made notes accordingly on our etherpad when they were not.

Our discussion then shifted into broad goals for our week, out came the whiteboard! It was decided that we’d focus on getting all the patches landed to support US West so that by the end of the sprint we’d have at least one working cloud. It was during this discussion that we learned how valuable hosting our sprint at an HPE facility was. An attendee at our sprint, Phil Jensen, works in the Fort Collins data center and updated us on the plans for moving systems out of US West. The timeline that he was aware of was considerably closer than we’d been planning on. A call was scheduled for Thursday to sort out those details, and we’re thankful we did since it turns out we had to effectively be ready to shut down the systems by the end of our sprint.

Goals continued for various sub-tasks in what coalesced in the main goal of the sprint: Get a region added to Nodepool so I could run a test on it.

Tuesday morning we began tackling our tasks, and at 11:30 Phil came by to give us a tour of the local data center there in Fort Collins. Now, if we’re honest, there was no technical reason for this tour. All the systems engineers on our team have been in data centers before, most of us have even worked in them. But there’s a reason we got into this: we like computers. Even if we mostly interact with clouds these days, a tour through a data center is always a lot of fun for us. Plus it got us out of the conference room for a half hour, so it was a nice pause in our day. Huge thanks to Phil for showing us around.

The data center also had one of the server types we’re using in infra-cloud, the HP SL390. While we didn’t get to see the exact servers we’re using, it was fun to get to see the size and form factor of the servers in person.

Spencer Krum checks out a rack of HP SL390s

Tuesday was spent heads down, landing patches. People moved around the room as we huddled in groups, and there was some collaborative debugging on the projector as we learned more about the deployment, learned a whole lot more about OpenStack itself and worked through some unfortunate issues with Puppet and Ansible.

Not so much glamour, sprints are mostly spent working on our laptops

Wednesday was the big day for us. The morning was spent landing more patches and in the afternoon we added our cloud to the list of clouds in Nodepool. We then eagerly hovered over the Jenkins dashboard and waited for a job to need a trusty node to run a test…

Slave ubuntu-trusty-infracloud-west-8281553 Building swift-coverage-bindep #2

The test ran! And completed successfully! Colleen grabbed a couple screenshots.

We watch on Clark Boylan’s laptop as the test runs

Alas, it was not all roses. Our cloud struggled to obey the deletion command and the test itself ran considerably slower than we would have expected. We spent some quality time looking at disk configurations and settings together to see if we could track down the issue and do some tuning. We still have more work to do here to get everything running well on this hardware once it has moved to the new facility.

Thursday we spent some time getting US East patches to land before the data center moves. We also had a call mid-day to firm up the timing of the move. Our timing for the sprint ended up working out well for the move schedule, we were able to complete a considerable amount of work at the sprint before the machines had to be shut down. The call was also valuable in getting to chat with some of the key parties involved and learn what we needed to hand off to them with regard to our requirements for the new home the servers will have, in an HPE POD (cool!) in Houston. This allowed us to kick off a Network Requirements for Infracloud Relocation Deployment thread and Cody A.W. Somerville captured notes from the rest of the conversation here.

The day concluded with a chat about how the sprint went. The feedback was pretty positive, we all got a lot of work done, Spencer summarized our feedback on list here.

Personally, I liked that the HPE campus in Fort Collins has wild rabbits. Also, it snowed a little and I like snow.

I could have done without the geese.

It was also enjoyable to visit downtown Fort Collins in the evenings and meet up with some of the OpenStack locals. Plus, at Coopersmith’s I got a beer with a hop pillow on top. I love hops.

More photos from the week here: https://www.flickr.com/photos/pleia2/sets/72157662730010623/

David F. Flanders also Tweeted some photos: https://twitter.com/dfflanders/status/702603441508487169