This is the fifth part in a series on applying devops principles and practices to game development. You can read the first post in the series, and see the entire series under the devops in game dev tag.
In our post on what the devops philosophy is, we wrote about revisiting workflow annoyances periodically. Sometimes you get more time and/or money. Sometimes you learn of an easy way to solve a problem.
There's something that got a lot easier for us recently: chatops.
"Chatops" is a trendy word for a subset of devops that focuses on streamlining work using extensible chatbots (e.g., Lita, Hubot, and Errbot) in team communication tools (e.g., Slack, HipChat, etc.). We use Lita on Slack, so I'll stick with those as concrete examples.
As a simple-but-nice examples, you might ask Lita to run an automated build for you, and it will connect to Jenkins and run the build you ask for. You don't need to leave Slack open a tab, log into Jenkins, find the job you need, and run it.
Something really important that well-implemented chatops provides is the ability to add context-appropriate information to conversations that are already happening.
Motivation
Here is a workflow annoyance: whenever sites are down or erroring, we end up with lots of browser tabs open that we need to:
- Refresh to keep up on the status of servers,
- Search for internal information (our wiki, email, or Google Docs archives), or
- Search externally for answers (Google, StackExchange sites, etc.).
This is all happening while maintain a conversation to coordinate efforts in Slack.
Here's a related workflow annoyance: in our sprint planning meetings, we finish the meeting by planning our social media activity for the week, generally around any upcoming sales, project milestones, relevant holidays, etc. This happens in a mixture of:
- Slack,
- Our FPG calendars (where we put known "PR" dates, like sales),
- An internal tool we built for generating tagged links, and
- Hootsuite (the tool we often use to post to SM).
In both cases, there's a lot of switching—switching tabs, apps, and contexts. We sometimes ask, "A bunch of servers came back up, but was that everything?" and then have to find the tab showing server statuses and refresh to reassess. We're probably also getting a flurry of texts or calls from our server monitoring system.
The State of the Tools
In the absence of already-existing tools, these weren't huge issues for us, but they were annoyances. We knew it wasn't impossible to write plugins for Lita that could run Jenkins jobs or retrieve server statuses, but the logic and wording would have to be rigid. Existing third-party plugins for Lita suffer from over-rigidity and small feature sets that don't combine well with other plugins.
I recently attended Microsoft Ignite 2017, and it was eye-opening on the current state of developer tools.
One of those was natural language processing, which has come a long way in being readily available to "regular" developers in the last few years. Major systems like Google's Dialogflow (previously API.ai) and Microsoft's Luis are incredibly easy for developers to leverage and you get a lot for free with these.
After some setup (all in the fairly nicely-designed web interface if you wish), you can start asking questions and getting responses that essentially say, "The AI thinks this question is about Jenkins
, asking to start
a build named exploitzeroday
."
You write the webhook that gets and interprets that response. Its logic tends to be "if this is about Jenkins and the user is asking to start a build, send an API request to Jenkins to start the build."
This chatops system has three main components:
- A bot that's listening in your communication tool and sends text it gets to the AI (usually only when the bot's name is included in the message) and gets any responses back for the user;
- A natural language processing service that you've provided some topics and examples to; and
- A webhook you write that interprets what the AI says the user wants, carries out the action if there is one, and sends a message back to the user.
There's probably an existing plugin to connect your bot to the AI—those are pretty popular for "adding personality to your bot" with random polite responses. You still have to program the interpreting webhook, and that seems like it will become a hodgepodge of mini-services: something that connects to Jenkins, something that connects to a server monitoring system, something that connects to Google Docs to search files, etc.
This Is Deep!
It is, but it can be broken up nicely. My first experiment was to search our PR calendar to answer questions like "When is our next sale?" and "When does the current sale end?" Google Calendar API is actually surprisingly obnoxious to work with (authentication-wise), so it was a good test case.
Working it up took about 6-7 hours and cost no money beyond what we were already spending to host Lita and Slack. I even used the Firebase inline code editor that Dialogflow provides instead of setting up my own server. This was a quick-and-dirty "is this worth my time" proof of concept.
What Components Are Next?
Following that and a server status checker component, I made a list called the "ChatOps Wishlist". This is a living document, intended to be a sorted list of stuff we'd find useful. It's right in line with keeping a list of annoyances. Here's what that list actually looks like as of this writing:
Run Jenkins jobs(done)- Restart servers
- "New sprint": Make new retrospective document in Google Docs and start new Jira sprint if it doesn't exist
- Do a knowledge search (wiki + gdocs + jira? + blog? + slack? + highrise?)
- Create Jira tasks
- Get sales stats (Steam, itch, Humble sales for a time period)
- Get details of a Jira task/search (link, what sprint it's in, status, assignment)
- Get sprint status (burndown, story point status)
Most of those things are possible, at least in part. The difficulty isn't in natural language processing or the bot, but in the APIs provided by, for instance, Steam or Jira.
Remember, the goal of chatops is to have a bot that provides context-appropriate information and actions for conversations already happening. What drives our priorities will be based on how many times we have to go, "Hold on, let me go find/check that..." in Slack.
Final Tips
You can do just about all of this for free if you're already using a communication tool and chatbot. We had to provide payment information in order to connect to non-Google APIs, but that will still likely be free for the low volume of our traffic.
At some point, I'll want to move the webhook out of Google's weird Firebase thing (which is what the above payment information is for) and instead have Lita provide that webhook herself. I'm not sure how to do that, but should be able to figure it out soon.
When you revisit your "annoyances" list, it's worth doing some broad research, especially if you normally keep your head down in a particular segment of the field (e.g., Unity, web development, etc.).
When you find something that seems all brand new, do what you can to break it up. Don't try to write 10 different chat topics and host it on your server and integrate with sites that don't even provide APIs... because it will feel like a major failure when everything is hard.
We're happy to answer any questions (conceptual or technical) in the comments!