From November 11th through 13th I attended and spoke at Usenix’s LISA15 (Large Installation Systems Administration) conference. I participated in a women in tech panel back in 2012, so I’d been to the conference once before, but this was the first time I submitted a talk. A huge thanks goes to Tom Limoncelli for reaching out to me to encourage me to submit, and I was amused to see my response to his encouragement ended up being the introduction to a blog post earlier this year. LISA has changed!
The event program outlines two main sections of LISA, tutorials and conference. I flew in on Tuesday in order to attend the three conference days from Wednesday through Friday. I picked up my badge Tuesday night and was all ready for the conference come Wednesday morning.
Wednesday began with a keynote from Mikey Dickerson of the U.S. Digital Service. It was one of the best talks I’ve seen all year, and I go to a lot of conferences. Launched just over a year ago (August 2014), the USDS is a part of the US executive office tasked with work and advisement to federal agencies about technology. His talk centered around the work he did post launch of healthcare.gov. He was working at Google at the time and was brought in as one of the experts to help rescue the website after the catastrophic failed launch. Long hours, a critical 24-hour news cycle that made sure they stayed under pressure to fix it and work to convince everyone to use best practices refined by the industry made for an amusing and familiar tale. The reasons for the failure were painfully easy to predict, no monitoring, no incident response plan or post-mortems, no formal testing and release process. These things are fundamental to software development in the industry today, and for whatever reason (time? money?) were left off this critical launch. The happy ending was that the site now works (though he wouldn’t go as far as saying it was “completely fixed”) and their success could be measured by the lack of news about the website during the 2014-2015 enrollment cycle. He also discussed some of the other work the USDS was up to, including putting together Requirements for Federal Websites and Digital Services, improvements to VA disability processing and the creation of the College Scorecard.
I then went to see Supercomputing for Healthcare: A Collaborative Approach to Accelerating Scientific Discovery (slides linked on that page) presented by Patricia Kovatch of the Icahn School of Medicine at Mount Sinai. She started off by talking about the vast amounts of data collected by facilities like Mount Sinai and how important having that data accessible and mine-able by researchers who are looking for cures to health problems. Then she dove into into collaboration, the keystone of her talk, bringing up several up important social points. Even as a technologist, you should understand the goals of everyone you work with, from the mission statement of your organization to yourself, your management, your clients and the clients (or patients!) served by the organization. Communication is key, and she recommended making non-tech friendly visualizations (that track metrics which are important – and re-evaluate those often), monthly reports and open meetings where interested parties can participate and build trust in your organization. She also covered some things that can be done to influence user behavior, like creating a “free” compute queue that’s lower priority but a department doesn’t need to pay for to encourage usage of that rather than taking over the high priority queue for everything (because everyone’s job is high priority when it’s all the same to them…). In case it’s not obvious, there was a lot of information in this talk squeezed into her time slot! I can’t imagine any team realistically going from having a poorly communicating department to adopting all of these suggestions, but she does present a fantastic array of helpful ideas that can be implemented slowly over time, each of which would help an organization. The slides are definitely worth a browse.
Next up was my OpenStack colleague Devananda van der Veen who was talking about Ironic: A Modern Approach to Hardware Provisioning. Largely divorcing Ironic from OpenStack, he spent this talk talking about how to use it largely as a stand alone tool for hardware provisioning. But he did begin by talking about how tools like OpenStack have started handling VMs, which themselves are abstractions of computers, and that Ironic takes that one step further, but instead of a VM you have hardware that’s an abstraction of a computer, thus putting bare metal and VMs on similar footing abstraction-wise with tooling in OpenStack with Ironic. He spent a fair amount of time talking about how much effort has been put in by hardware manufacturers into writing hardware drivers, and how quickly adoption in production has taken off with companies like Rackspace and Yahoo! being very public about their usage.
The hallway track was strong at this conference! The next talk I attended was in the afternoon, The Latest from Kubernetes by Tim Hockin. As an open source project, I feel like Kubernetes has moved very quickly since I first heard about it, so this was really valuable talk that skipped over introductory details and went straight to talking about new features and improvements in version 1.1. There’s iptables kube-proxy (yay kernel!), support for a level 7 loadbalancer (Ingress), namespaces, resource isolation, quota and limits, network plugins, persistent volumes, secrets handling and an alpha release of daemon sets. And his talk ran long, so he wasn’t able to get to everything! Slides, all 85 of them, are linked to the talk page and are valuable even without the accompanying talk.
My day wrapped up with My First Year at Chef: Measuring All the Things by Nicole Forsgren, the Director of Organizational Performance & Analytics at Chef. Nicole presented a situation where she joined a company that wanted to do better tracking of metrics within a devops organization and outlined how she made this happen at Chef. The first step was just talking about metrics, do you have them? What should you measure? She encouraged making sure both dev and ops were included in the metrics discussions so you’re always on the same page and talking about the same things. In starting these talks, she also suggested the free ~20 page book Data Driven: Creating a Data Culture for framing the discussions. She then walked through creating a single page scorecard for the organization about key things they want to see happen or improve, pick a few key things and then work toward how they can set targets and measure progress and success. Benchmarks were also cited as important, so you can see how you’re doing compared to where you began and more generally in the industry. Advice was also given about what kinds of measurement numbers to look at: internal, external, cultural and whether subjective or objective makes the most sense for each metric, and how to go about subjective measuring.
I had dinner with my local friend Mackenzie Morgan. I hadn’t seen her since my wedding 2.5 years ago, so it was fun to finally spend time catching up in person, and offered a stress-free conclusion to my first conference day.
The high-quality lineup of keynote speakers continued on Thursday morning with Christopher Soghoian of the ALCU who came to talk about Sysadmins and Their Role in Cyberwar: Why Several Governments Want to Spy on and Hack You, Even If You Have Nothing to Hide. He led with the fact that many systems administrators are smart enough to know how to secure themselves, but many don’t take precautions at home: we use poor passwords, don’t encrypt our hard drives, etc. I’m proud to say that I’m paranoid enough that I actually am pretty cautious personally, but I think that stems from being a hobbiest first, it’s always been natural for my personal stuff to be just as secure as what I happen to be paid to work on. With that premise, he dove into government spying that was made clear by Snowden’s documents and high profile cases of systems administrators and NOC workers being targeted personally to gain control of the systems they manage either through technical means (say, sloppy ssh key handling), social engineering or stalking and blackmail. Know targets have been people working for the government, sysadmins at energy and antivirus companies, but he noted any of us could be a target if the data we’re responsible for administering is valuable in anyway. I can’t say any of the information in the talk was new to me, but it was presented in a way that was entertaining and makes me realize that I probably should pay more attention in my day to day work. Bottom line: Even if you’re just an innocent, self-proclaimed boring geek who goes home and watches SciFi after work, you need to be vigilant. See, I have a reason to be paranoid!
I picked up talks in the afternoon by attending one on fwunit: Unit Testing and Monitoring Your Network Flows with Fwunit by Dustin J. Mitchell. The tool was specifically designed for workflows at Mozilla so only a limited set of routers and switches are supported right now (Juniper SRX, AWS, patches welcome for others), but the goal was to be able to do flow monitoring on their network in order to have a good view into where and how traffic moved through their network. They also wanted to be able to do this without inflexible proprietary tooling and in a way that could be scripted into their testing infrastructure. Did a change they make just cut off a bunch of traffic that is needed by one of their teams? Alert and revert! Future work includes improvements to tracking ACLs, optimized statistic gathering and exploring options to test prior to production so reverts aren’t needed.
Keeping with the networking thread, Dinesh G Dutt of Cumulus Networks spoke next on The Consilience Of Networking and Computing. The premise of his talk was that the networking world is stuck in a sea of proprietary tooling that isn’t trivial to use and the industry there is losing out on a lot of the promises of devops since it’s difficult to automate everything in an effective manner. He calls for a more infrastructure-as-code-driven plan forward for networking and cited places where progress is being made, like in the Open Compute Project. His talk reminded me of OpenConfig working group that an acquaintance has been involved with, so it does sound like there is some consensus among network operators about where they want to see the future go.
The final talk I went to on Thursday was Vulnerability Scanning’s Not Good Enough: Enforcing Security and Compliance at Velocity Using Infrastructure As Code by Julian Dunn. He was preaching to the choir a bit as he introduced how useless standard vulnerability scanning is to us sysadmins (“I scanned for your version of Apache, and that version number is vulnerable” “…do you not understand how distro patches work?”) and expressed how challenging they are to keep up with. His proposal was two fold. First, that companies get more in the habit of prioritizing security in general rather than passing arbitrary compliance tests. Second, to consolidate the tooling used by everyone and integrate it into the development and deployment pipeline to make sure security standards are adhered to in the long run (not just when the folks testing for compliance are in the building). To this end, he promoted use of the Chef Inspec Project.
Thursday evening was the LISA social, but I skipped that in favor of a small dinner I was invited to at a local Ethiopian restaurant. Fun fact: I’ve only ever eaten Ethiopian food when I’m traveling, and the first time I had it was in 2012 when I was in San Diego, following my first LISA conference!
The final day of the conference began with a talk by Jez Humble on Lean Configuration Management. He spent some time reflecting on modern methodologies for product development (agile, change management, scrum), and discussed how today with the rapid pace of releases (and sometimes continuous delivery) there is an increasing need to make sure quality is built in at the source and bugs are addressed quickly. He then went into the list of very useful indicators for a successful devops team:
- Use of revision control
- Failure alerts from properly configured logging and monitoring
- Developers who merge code into trunk (not feature branches! small changes!) daily
- Peer review driven change approval (not non-peer change review boards)
- Culture that exhibits the Generative organizational structure as defined by R Westrum in his A typology of organisational cultures
He also talked a fair amount about team structures and the ricks when not only dev and ops are segregated, but also product development and others in the organization. He proposed bringing them closer together, even putting an ops person on a dev team and making sure business interests and goals in the product are also clearly communicated to everyone involved.
It was a pleasure to have my talk following this one, as our team strives to tick off most of the boxes when it comes to having a successful team (though we don’t really do active, alerting monitoring). I spoke on Tools for Distributed, Open Source Systems Administration (slides linked on the linked page) where I walked through the key strategies and open source tools we’re using as a team that’s distributed geographically and across time zones. I talked about our Continuous Integration system (the heart of our work together), various IRC channels we use for different purposes (day to day sync-up, meetings, sprints, incidents), use of etherpads for collaborative editing and work and how we have started to address hand-offs between time zones (mostly our answer is “hire more people in that time zone so they have someone to work with”). After my talk I had some great chats with folks either doing similar work, or trying to nudge their organization into being productive across offices. The talk was also well attended, so huge thanks to everyone who came out to it.
At lunch time I had a quick meal with Ben Cotton before sneaking off to the nearby zoo to see if I could get a glimpse of the pandas. I saw a sleeping panda. I was back in time for the first talk after lunch, Thomas A. Limoncelli on Transactional System Administration Is Killing Us and Must be Stopped. Many systems administrators live in a world of tickets. Tickets come in, they are processed, we’re always stressed because we have too many tickets and are always running around to get them done with poor tooling for priority (everything is important!). It also leads to a very reaction-driven workflow, instead of fixing fundamental long term issues and long term planning is very hard. It also creates a bad power dynamic, sysadmins begin to see users as a nuisance, and users are always waiting on those sysadmins in order to get their work done. Plus, users hate opening tickets and sysadmins hate reading tickets opened by users. Perhaps worst of all, we created this problem by insisting upon usage of ticketing systems in the 90s. Whoops. In order to solve this, his recommendations are very much in line with what I’d been hearing at the conference all week: embed ops with dev, build self-service tooling so repeatable things are no longer manually done by sysadmins (automate, automate, automate!), have developers write their own monitors for their software (ops don’t know how it works, the devs do, they can write better monitoring than just pinging a server!). He also promoted the usage of Kanban and building your team schedule so that there is a rotating role for emergencies and others are able to focus on long term project work.
The final talk of the main conference I attended was The Care and Feeding of a Community by Jessica Hilt. I’ve been working with communities for a long time, even holding some major leadership positions, but I really envy the experience that Jessica brought to her talk, particularly since she’s considerably more outgoing and willing to confront conflict than I am. She began with an overview of different types of communities and how their goals matter so you can collect the right group of people for the community you’re building. She stressed that goals like cooperative learning (educational, tech communities, beyond) is a valuable use of a group’s time and helps build expertise and encourages retention when members are getting value. Continuing on a similar theme, networking and socialization are important, so that people have a bond with each other and provide a positive feedback loop that keeps the community healthy. During a particularly amusing part of her talk, she also mentioned that you want to include people who complain, since it’s often that the complainers are passionate about the group topic, but are just grumpy and they can be a valuable asset. Once you have ideas and potential members identified, you can work on organizing. What are the best tools to serve this community? What rules need to be in place to make sure people are treated fairly and with respect? She concluded by talking about long term sustainability, which includes re-evaluating the purpose of the group from time to time, making sure it’s still attracting new members, confirming that the tooling is still effective and that the rules in place are being enforced.
During the break before the closing talks of the conference I had the opportunity to meet the current Fedora Project Lead, Matthew Miller. Incidentally, it was the same day that my tenure on the Ubuntu Community Council officially expired, so we were able to have an interesting chat about leadership and community dynamics in our respective Linux distributions. We have more in common than we tend to believe.
The conference concluded with a conference report from the LISA Build team that handled the network infrastructure for the conference. They presented all kinds of stats about traffic and devices and stories of their adventures throughout the conference. I was particularly amused when they talked about some of the devices connecting, including an iPod. I couldn’t have been the only one in the audience brainstorming what wireless devices I could bring next year to spark amusement in their final report. They then handed it off to a tech-leaning comedian who gave us a very unusual, meandering talk that kept the room laughing.
This is my last conference of the year and likely my last talk, unless someone local ropes me into something else. It was a wonderful note to land on in spite being tired from so much travel this past month. Huge thanks to everyone who took time to say hello and invite me out, it went a long way to making me feel welcome.
More photos from the conference here: https://www.flickr.com/photos/pleia2/sets/72157660670374520