• Archives

  • Categories:

MesosCon EU 2017

I attended my first MesosCon was back in September in Los Angeles. In October I head the pleasure of participating in my second, this time the European version in Prague. As I’ve mentioned previously, this was a busy week for me. My participation in All Things Open kicked off the week, and I landed in Prague on Wednesday evening with just enough time to check into my room, say hello to some folks I knew who were in town for the Open Source Summit and then head off to a planned dinner with work folks at Restaurant Mlýnec.

The next morning arrived quickly as I was up and available for a 7:30AM breakfast with some of the MesosCon Europe keynote speakers. Part of the breakfast was spent chatting with the panelists who would join me on stage that morning to participate in a panel on “SMACK in the Enterprise” that I was moderating. At 8:30 it was time to go down to the keynote room and see that everything was on schedule for our 9AM start. The folks at the Linux Foundation do a great job coordinating these events, it was a pleasure working with them as a speaker and track lead throughout the event.

At 9AM Ben Hindman opened the conference with a talk on the current state of Apache Mesos, reflecting on past MesosCons and the past year of developments. Improvements in the platform have included things like nesting of containers, and the creation of the Container Storage Interface (CSI). Fault domains and the promise of multi-cloud also made an appearance in his keynote.

Directly following Ben’s talk was one that, as a pure open source enthusiast, made me really happy to attend. Rich Bowen of the Apache Software Foundation who came to talk to us about The Apache Way. Given my background and current role, I’m familiar with the history of the foundation and loosely keep tabs on their current work. Still, seeing a history presented with a message is always an enjoyable way to consume it, and Rich did a masterful job weaving the history in with where we are today, and explaining what “Apache” means in the Apache Mesos project. From bottom-up leadership to collaborative decision-making and they way they approach conflict-resolution, there’s a lot to admire. He also drove home the importance of transparency in a project, with no decisions being made privately or in ways that are difficult to document for future participants of the project. He also touched upon where the project was today, with over 100 projects within the Apache Software Foundation and the Apache License remaining strong in the industry.

The SMACK (Spark, Mesos, Akka, Cassandra, Kafka) in the Enterprise panel was next, and as moderator I was thrilled to be joined by representatives from Audi, Deutsche Telecom and ASML. From connected cars to how they’re making innovations on mobile networks, learning how companies are using Mesos and the entire stack to do fast data processing really brings all the work we do into focus. Innovation is happening across various industries as we all become more familiar with what we’ll need to succeed in a world with so much data coming at us, and I’m proud to be a part of it.

Thanks to the Apache Community for taking this photo (source)

The keynotes concluded with one from Netflix Engineering Director, Katharina Probst. Internally at Netflix they’ve built a massive infrastructure that is, at a high level, managed day to day by a relatively small team of Site Reliability Engineers. This has been accomplished through the tooling they built called Mantis, that allows not only for the essential autoscaling that a company like Netflix requires, with viewing numbers dropping considerably when each region has their work day, and picking up in the evening hours, but also real time testing and metrics from their platform. The key for them has been not only monitoring to detect that there is a problem somewhere, but finding out exactly where it is and reporting to the engineers what the problem may be, so a fix can be developed without a long session of troubleshooting first. As someone who used to work in operations, this was something I could really appreciate, I’ve spent many long evenings chasing down problems, evenings that could have been made much shorter with the addition of some automation to rule out the usual suspects quickly. The scale of their operations also never ceases to amaze me, and she was able to share some statistics around that as well.

This first day of the conference landed me as the track lead for the DevOps and Ops track. The day began with a talk from David vonThenen around external storage. In this talk he gave a cost/benefit evaluation of local versus external storage, and then, not keeping to just files, he ran the same evaluation when looking at storage for databases, both traditional like Postgres and MySQL and the newer distributed databases. Tying this in to Mesos, he touched upon CSI work and mentioned that REX-Ray is currently being used by DC/OS to handle the attachment of external volumes. These were interesting considerations in what has been a quickly growing part of our ecosystem as demands upon reliable storage solutions for containers quickly increase. His talk was followed by one from Ádám Sándor titled “Knee Deep in Microservices” and which had a Doom theme to demonstrate the new “demons” we have let loose as we’ve also taken all the benefits of migrating to microservices. He cited the key elements of DevOps, resilience, elasticity, resource abstraction, and tooling that helps us monitor containers as key improvements in the microservices platforms that are helping us tame the demons we have unleashed. It was also nice for me to hear these things, these massive distributed systems are complicated, we need to be continually improving our toolkit (weapons!) to effectively manage them.

After lunch Adam Bordelon and Alexander Rojas joined the track to give a popular talk titled “Mesos Security Exposed!” I won’t enumerate them all here, but by digging into how Mesos works, they were able to pull back the curtain and explain what needed securing in your cluster, from API endpoints to use of TLS and the handling of private data (secrets). Julien Stroheker then spoke on “Doing Real DevOps with DC/OS” where he gave a live demonstration of a continuous integration pipeline with Jenkins, Docker, Gradle and Vamp. The sessions for the day concluded with Zain Malik giving us a tour of mesos2iam, “IAM Credentials for Containers Running Inside a Mesos Cluster”.

But the day was not over! Appetizers and drinks at the attendee reception were directly followed by Town Halls, where attendees could casually gather in groups to talk about Mesos, Marathon, DC/OS or Kubernetes. My colleague Matt led the DC/OS session, with me taking notes and pitching in here and there. Our Town Hall began with introductions, with 15-20 people in attendence as the 90 minute evening session progressed we had a nice mix of people from a variety of unrelated industries. With the ice broken, it was easy to get people talking about pain points they had with DC/OS and we had Adam Bordelon in the room to give history and insight into specific features and concerns that people were bringing up. At the end of the session we had a nice list of shared struggles, but the attendees also were able to swap knowledge with each other. Empathy goes a long way, and in my own work I know how valuable it is to know that concerns you have are being shared and solved by others too.

Attendees file in for the DC/OS town hall Thursday evening

The Town Halls wrapped up at 8:30 and I headed straight to bed. Between the jet lag and the 13+ hour day, I was exhausted.

On Friday the keynotes also began at 9AM, but the keynote slot was shorter in order to squeeze more track sessions in during the rest of the day.

After an welcome and opening remarks from Jörg Schad, the first keynote came from Yrieix Garnier who gave a more Enterprise-focused take on “The Future of Apache Mesos and DC/OS” than Ben had explored in his keynote the day before. He was able to pull up statistics about the data-processing power of the platforms, sharing that 50% of DC/OS clusters were running some form of the SMACK or ELK stacks. The big news from his talk was unveiling of TensorFlow support in DC/OS. We then had Pierre Cheynier join us to share a talk that he had originally proposed during our CFP, but we upgraded to a keynote, “Operating 600+ Mesos Servers on 7 Datacenters @Criteo”. The ability to scale is a key feature of Mesos, so it was fascinating to learn about the scale of their work and just how much data they were storing (171PB on Hadoop!). He also shared a series of tips for other organizations looking to operate at this scale, including effectively automating everything (configuration management, scaling, CI system), defensive configuration (things will go wrong, be prepared), visibility to operations as to what is going on (metrics, alerts, tracing), and the importance of doing networking right, and addressing problems like QoS and “noisy neighbors” during design. He also covered some incidents which gave further insight into the sorts of things they were able to effectively prevent, or not.

In a slight shift from Thursday, I spent Friday as the track lead for the Users and Ops track. The first talk came from the oldest company in the lineup, the publisher Houghton Mifflin Harcourt. I was looking forward to this talk, and as Robert Allen got into the details of “How HMH Went from Months to Minutes for Infrastructure Delivery” I was not disappointed. In a common theme for the day, he brought up the slow, inconsistent, old technology team was not keeping up with either the industry or their own product lines, as the publishing industry is rapidly changing to serve a more tech savvy customer base. He walked us through the creation of their “Bedrock tech services” team, and their DevOps focused goals, including comprehensive developer involvement from idea to production, a continuous delivery approach that encouraged small, frequent updates, and a change in culture that made them shift from feeling like they must prevent failure, and instead acting as if failure will occur and planning accordingly. He then dove into the technologies used, beyond Apache Mesos, they’ve also been using Apache Aurora, Terraform for orchestration, Vault for secrets, and a Jenkins plus Artifactory CI/CD pipeline. He also stressed the importance of metrics and logging, all things close to my own interests as well!

Tim Nolet then joined us to give a talk on “Advanced Deployment Strategies and Workflows for Containerized Apps on DC/OS” where he too walked us down the painful memory lane of the massive, error-prone deployments of the 2000s, the rise of DevOps and today the productization of a lot of the DevOps tooling, including Vamp, which he works on the development of. As a product, Vamp seeks to simplify and tame the ecosystem of microservices you have running by simplifying the process of deployments. He then showed a demo of load being distributed across different versions of a deployed application, as well as different versions being served up to customers using different clients.

In our next user story, we heard from Jay Chin on “Optimising Mesos Utilization at Opentable”. He began his talk with a quick production history of the infrastructure at OpenTable, sharing that they made the move to microservices in 2013. During this time they were orchestrating their microservices via standard configuration management tooling, a process that turned out to not only difficult to maintain at scale, but was actively disliked by the engineers who had to work on it. In 2014 they switched to Mesos, and through resource abstraction and running a consolidated cluster, they were able to simplify application-level operations. Additionally, it allowed them to easily create centralized metrics and logging. It’s a story I’d heard before from other companies, and one I was thrilled to hear again, but where OpenTable really led the way here was by open sourcing some interesting tools they were using, including the following mentioned during his talk:

The day continued with Ilya Dmitrichenko on “Time Traveling in the Universe of Microservices and Orchestration” where he used his own career as a baseline for the changes he was seeing in the industry with regard to the rise of microservices. His dabbling with old Sun boxes and awareness of things like openvz is consistent with some of my own hobbiest and day job work early in my career. Indeed, containers have always been with us. He went on to track the rise of Docker simplifying the space and tooling coming together to make management of a microservices infrastructure feasible for smaller organizations. The culmination of this story was his current work at Weaveworks which, just like Vamp improves deployments, improves and simplifies your network policy.

Jorge Salamero of Sysdig joined us next with the amusingly titled talk, “WTF, My Container Just Spawned a Shell”. I was immediately fond of this talk because after a sea of Macs, Jorge uses Xubuntu. He began his presentation by talking about how we consume container images, and the rise in static scanning of these images for libraries with vulnerabilities and more, but that this only goes so far. There are things that these scans miss because they need constant updating, and which behavioral analysis while the container is running will catch. He introduced how Sysdig’s metrics tooling used with Falco can give you a comprehensive view inside the containers you’re running. Suddenly you have access to security tracking that can show you every command taking place inside of your containers, and from there you can train it to be aware of problem behavior. He also talked about SysDig Inspect, the open source project that they have built their Sysdig Secure product with.

Julien Stroheker then joined us again to talk about his DC/OS autoscaler. This is a talk I’d already seen twice, so I won’t go into it, but it is a cool project that he’s always looking for help on! The final talk of the day came to us from Fredrik Lindner of Tunstall Nordic AB, who shared “There and Back Again: How Tunstall Healthcare Built an IoT Platform for Health Monitoring Using Mesos Cluster on Azure.” Just like so many other industries, elder care is undergoing a transformation with the help of IoT technology. He shared details of their “Evity” platform they were developing on top of DC/OS to help manage the data coming in from these devices so they can effectively and quickly meet the needs of the families they work with. It was a good story to end with, as with much of the technology we saw showcased throughout this MesosCon, as consumers we just assume it will all work. My FitBit will reliably track my steps and share them with my friends, my car will show me traffic, and when something goes wrong it’s unexpected and unsettling. This is even more pronounced with you get into the space of technologies that help with things like healthcare, where reliability, accuracy and speed have even more urgency. We’re building the platform that people are using to make sure these all run well, and that’s pretty exciting.

When I finally stepped out of that track, all the booths had been taken down and most of the conference attendees had gone their own ways. I was able to finally leave the hotel with Matt and Jörg to explore some of the old city in what turned into a lovely night that thankfully didn’t keep me out too late. The conference was great, but predictably exhausting, especially coming on the heels of All Things Open on another continent.

Huge thanks to everyone involved in organizing, and to all the speakers in my tracks who made the event really interesting by sharing their stories, tools and expertise. This event was a little smaller than the one in Los Angeles, but it didn’t feel like it, and the quality of the event was top notch.

More of my photos from the event can be found here: https://www.flickr.com/photos/pleia2/albums/72157689885991786 and more photos, slides and videos are hosted by the Linuxd Foundation at http://events.linuxfoundation.org/events/mesoscon-europe