I’m finally home for a month, so I’ve taken advantage of some of this time to attend and present at some local events. The first of which was Elastic{ON}, the first user conference for Elasticsearch and related projects now under the Elastic project umbrella. The conference venue was Pier 27, a cruise terminal on the bay. It was a beautiful venue with views of the bay, and clever use for a terminal while there aren’t ships coming in.
The conference kicked off with a keynote where they welcomed attendees (of which there were over 1300 from 35 countries!) and dove into project history from the first release in 2010. A tour of old logos and websites built up to the big announcement, the “Elastic” rebranding, as the scope of their work now goes beyond search in the former Elasticsearch name. The opening keynotes continued with several leads from projects within the Elastic family, including updates from Logstash and Kibana.
At lunch I ended up sitting with 3 other women who were attending the conference on behalf of their companies (when gender ratios are skewed, this type of congregation tends to happen naturally). We all got to share details about how we were using Elasticsearch, so that was a lot of fun. One woman was doing data analysis against it for her networking-related work, another was using it to store metadata for videos and the third was actually speaking that afternoon on how they’re using it to supplement the traditional earthquake data with social media data about earthquakes at the USGS.
Track sessions began after lunch, and I spent my afternoon camped out in the Demo Theater. The first talk was by the Elastic Director of Developer Relations, Leslie Hawthorne. She talked about the international trio of developer evangelists that she works with, focusing on their work to support and encourage meetup groups worldwide, noting that 75 cities now have meetups with a total of over 17,000 individual members. She shared some tips from successful meetup groups, including offering a 2nd track during meetups for beginners, using an unconference format rather than set schedule and mixing things up sometimes with hack nights on Elastic projects. It was interesting to learn how they track community metrics (code/development stats, plus IRC and mailing list activity) and she wrapped up by noting the new site at https://www.elastic.co/community where they’re working to add more how-tos and on-ramping content, which their recent acquisition of Found, which has maintained a lot of that kind of material.
Leslie Hawthorn on “State of the Community”
The next session was “Elasticsearch Data Journey: Life of a Document in Elasticsearch” by Alexander Reelsen & Boaz Leskes. When documents enter Elasticsearch as json output from a service like Logstash, it can seem like a bit of a black box as far as what exactly happens to it in order for it to be added to Elasticsearch. This talk went through what happens. It’s first stored in Elasticsearch, where it’s stored node-wise is based on several bits of criteria analyzed upon bringing in, and the data is normalized and sorted. While the data is coming in, it’s stored in a buffer and also written to a transaction log until it’s actually committed to disk, at which time it’s still in the transaction log until it can be replicated across the Elasticsearch cluster. From there, they went into discussing data retrieval, cluster scaling and while stressing that replication is NOT backups, how to actually do backups of each node and how to restore from them. Finally, they talked about the data deletion process and how it queues data for deletion on each node in segments and noted that this is not a reversible option.
Continuing in “Life of” theme, I also attended “Life of an Event in Logstash” by Colin Surprenant. Perhaps my favorite talk of the day, Colin did an excellent job of explaining and defining all the terms he used in his talk. Contrary to popular belief, this isn’t just useful to folks new to the project, but as a systems administrator who maintains dozens of different types of applications over hundreds of servers, I am not necessarily familiar with what Logstash in particular calls everything terminology-wise, so having it made clear during the talk was great. His talk walked us through the 3 stages that events coming into Logstash go through: Input, Filter and Output, and the sized queues between each of them. The Input stage takes whatever data you’re feeding into Logstash and uses plugins to transform it into a Logstash event. The Filter stage actually modifies the data from the event so that the data is made uniform. The Output stage translates the uniform data into whatever output you’re sending it to, whether it’s STDOUT or sending it off to Elastisearch as json. Knowing the bits of this system is really valuable for debugging loss of documents, I look forward to having the video online to share with my colleagues. EDIT 3/20/2015: Detailed slides online here.
Colin Surprenant on “Life of an Event in Logstash”
I tended to a avoid many of the talks by Elasticsearch users talking about how they use it. While I’m sure there’s valuable insights to be gained by learning how others use it, we’re pretty much convinced about our use and things are going well. So use cases were fresh to me when the day 2 keynotes kicked off with a discussion with Don Duet, Co-head of Technology at Goldman Sachs. It was interesting to learn that nearly 1/3 of the employees at Goldman Sachs are in engineering or working directly with engineering in some kind of technical analysis capacity. They were also framed as very tech-conscious company and long time open source advocate. In exploring some of their work with Elasticsearch he used legal documents as an example: previously they were difficult to search and find, but using Elasticsearch an engineer was empowered to work with the legal department to make the details about contracts and more searchable and easier to find.
The next keynote was a surprising one, from Microsoft! As a traditional proprietary, closed-source company, they haven’t historically been known for their support of open source software, at least in public. This has changed in recent years as the world around has changed and they’ve found themselves needing to not only support open source software in their stacks but contributing to things like the Linux kernel as well. Speaker Pablo Castro had a good sense of humor about this all as he walked attendees through three major examples of Elasticsearch use at Microsoft. It was fascinating to learn that it’s used for content on MSN.com, which gets 18 billion hits per month. They’re using Elasticsearch on the Microsoft Dynamics CRM for social media data, and in this case their actually using Ubuntu as well. Finally, they’re using it for the search tool in their cloud offering, Azure. They’ve come a long way!
Pablo Castro of Microsoft
The final keynote was from NASA JPL. The room was clearly full of space fans, so this was a popular presentation. They talked about how they use Elasticsearch to hold data about user behavior from browsers on internal sites so they can improve them for employees. They also noted the terribly common practice of putting data (in this case, for the Mars rover) into Excel or Powerpoint and emailing it around as a mechanism for data sharing, and how they’ve managed to get this data into Elasticsearch instead, clearly improving the experience for everyone.
After the keynotes, it time to do my presentation! The title of my talk was “elastic-Recheck Yourself Before You Wreck Yourself: How Elasticsearch Helps OpenStack QA” and I can’t take credit for the title, my boring title was replaced by a suggestion from the talk selection staff. The talk was fun, I walked through our use of Elasticsearch to power our elastic-recheck (status page, docs) tooling in OpenStack. It’s been valuable not only for developer feedback (“your patch failed tests because of $problem, not your code”), but by giving the QA an Infrastructure teams a much better view into what the fleet of test VMs are up to in the aggregate so we can fix problems more efficiently. Slides from my talk are here (pdf).
All set up for elastic-Recheck Yourself Before You Wreck Yourself
Following my talk, ended up having lunch with the excellent Carol Willing. We got to geek out on all kinds of topics from Python to clouds as we enjoyed an outdoor lunch by the bay. Until it started drizzling.
The most valuable talk in the afternoon for me was “Resiliency in Elasticsearch and Lucene” with Boaz Leskes & Igor Motov. They began by talking about how with scale came the realization that more attention needed to be paid to recovering from various types of failures, and that they show up more often when you have more workers. The talk walked through various failures scenarios and how they’ve worked (and are working) on making improvements in these areas, including “pulling the plug” for a full shutdown, various hard disk failures, data corruption, and several types of cluster and HA failures (splitbrain and otherwise), out of memory resiliency and external pressures. This is another one I’m really looking forward to the video from.
The event wrapped up with a panel from GuideStar, kCura and E*Trade on how they’re using Elasticsearch and several “war stories” from their experiences working with the software itself, open source in general and Elastic the company.
In all, the conference was a great experience for me, and it was an impressive inaugural conference, though perhaps I should have expected that given the expertise and experience of the community team they have working there! They plan on doing a second one, and I recommend attendance to folks working with Elasticsearch.
More of my photos from the conference here: https://www.flickr.com/photos/pleia2/sets/72157650940379129/