*elastic-recheck *Session proposed by Matthew Treinish *Thursday November 7, 2013 11:50am - 12:30pm *Design Summit 1 *Infrastructure Late in the Havana cycle we introduced elastic-recheck which uses logstash to find known gate bugs when a test run fails. In the short time the bot has been running it has been incredibly useful. Both at identifying known repeating failures and finding new bugs. Moving forward there are improvements and features we'd like to see added to the bot including but not limited to: * zeromq (or gearman, or whatever) triggering from logstash when a test's logs are ready * currently it watches Gerrit to see when the logs have finished copying * have logstash gearman workers notify logstash gearman client when logs are done being processed then gearman client can notify elastic-recheck that all files for job are complete. * a different method of storing/managing queries (lp, in a db, separate git repo, etc.) - vote no * some easy way to direct link to logstash functional tests * file per bug * reworking the recheck status page with graphs from logstash and integrating elastic-recheck more closely with the page. * kibana3 will make this very easy in theory. It takes yaml/json configuration for dashboards that could be used to draw many graphs easily * requirements * percent of unclassifed failures - gate fails * what projects are affected * percentage trending * what about using idle runs ... ? * take jim's current approach - cron updated * recheck bugs we don't know about * Documentation? http://docs.openstack.org/developer/elastic-recheck doesn't work * we had planned on documenting it under ci.openstack.org/ but could publish elsewhere. It has not been written yet * make it close to logstash docs * ability to include common section across 2 docs * Review culture * self approve on queries is ok - AGREED * Test adds * making integration tests work live on elastic search * Elastic recheck spewing out metrics to graphite * New cli tools in the tree * Bayesian analysis * what is the chance that on a failure that a particular line in the logs would be indicative of a fail * virtual sprint late cycle? This session will be the place to describe what additional features we should be adding to elastic-recheck and how we can enable the tool to be even more useful and automate more of test failure analysis. http://status.openstack.org/elastic-recheck/