# How to shuttle Gutenberg
Start by the Before Shuttling guide before merging anything in the Gutenberg repository. Once your time has come, follow this guide!
Pre-shuttle stability
Check in MarfeelDashboard (opens new window), if something looks amiss, check with systems or in slack channel.
- number of instances (should be around 30 for
live
) - number of requests/min
- errors not too high
Gutenberg Live Shuttles happen in 2 phases: first, a test on shadow traffic, followed by the real shuttle if everything is fine.
- When it's your turn to shuttle, notify the other people in the #shuttle-stakeholders channel of which feature you are going to merge with a link to your PR.
Shuttling live feat(feature-toggle): add useMrf4u feature toggle (opens new window) Shuttling Insight feat(API) added userRoles for Leroy (opens new window)
- Merge your commit into master.
- Go to Gutenberg in Jenkins (opens new window) and wait until the build is done.
# Shadow Traffic
- Go to the GutenbergNewShuttle job (opens new window) and select "Build with Parameters"
- Unless this is a revert, keep lastSuccessfulBuild as
BUILD_SELECTOR
parameter.
lastSuccessfulBuild
Check that latest successful build (opens new window) is your build!
- Keep
TOMAHAWK_MINUTES
at 7, unless you need for some reason a longer test. - Select
ENVIRONMENTS
: always live, currently the only production environment for Gutenberg. - Keep
MARFEEL_BUCKET
as a, unless we want to test something critical in production. In that case, ask for help from SYS team.
Now there is a stack of consumers per shard receiving a Tomahawk (a.k.a. traffic shadowing) to make sure the new build is stable enough to go into production.
- Obtain the stack name of the new stack from the Jenkins logs (Console Output). Search for the following log in the page:
11:09:28 StackName-->mrfLiveConsumer-1017060-0518T0903<--
- Open the TomahawkComparison Kibana dashboard (opens new window) and filter the environment you have shuttled (
env:live
), and check that:- % java errors is stable.
- % of invalidation errors is stable.
- % of nginx errors (HTTP responses) is stable. Here you will see that many 304 (not-modified) responses have disappeared. This is expected as new stack doesn’t have them hashed.
- No new errors are introduced in any visualization in the right-column.
A correct Tomahawk picture looks like this:
Auto-refresh
Auto-refresh should be disabled by default. When enabled, it must always be set to over a minute to prevent the saturation of Elastic Search.
# Test your functionality
You can now test the functionality you introduced by mapping your local requests to this new DNS.
To do so, edit your local /etc/hosts
and add the proper entry mapping IP of new stack's load balancer to live.mrf.io
(i.e. ping <http://live-C-1015328-1126T1644-1800532394.eu-west-1.elb.amazonaws.com
>, obtained from build console output) to obtain IP, and add the line: X.X.X.X
(obtained IP) live.mrf.io
.
If you do test your functionality, be mindful of the Tomahawk stack's age.
# Revert during Tomahawk
If anything looks broken:
- Use Jenkins job GutenbergShuttle-DestroyStack (opens new window) with your previously obtained stack name to shut down the stack you created (i.e.
mrfLiveConsumer-1015324-1126T1146
). - Revert (opens new window) your commit in Github, so that everyone else can keep working on master safely.
# Shuttle
If everything is looking good in Tomahawk, change the DNS in production using job GutenbergShuttle-ChangeDNS (opens new window) with your previously obtained stack name (i.e. mrfLiveConsumer-1015324-1126T1146
).
This will direct the real traffic to your new stack, raise a new stack of producers and handle the cleanup of the previous stacks.
DNS change cannot wait forever!
Having this process available, you might think it would be a good idea to leave a Tomahawk executing while you do something else and come back an hour later to check results. While this might seem a not-so-bad idea, keep in mind that we have processes to shut-down automatically any consumers not receiving any traffic.
Therefore, it is NOT SAFE to change DNS to a stack that has spent a long period without traffic.
Visit MarfelDashboard (opens new window))) and Anomalies Monitoring (opens new window):
- Make sure the traffic is switching to the new build (color change) and that the error % does not go up.
- Check number of instances. There should be a number of new instances (between 15 and 30) for the new build, and from there number of instances should decrease over time until the old build is left at zero and the new one has a similar value than the one we started with.
Once everything is stable after changing DNS, inform once all process is finished to slack channel #shuttles-stakeholders.
WARNING
If for some reason your build should not be reverted without you knowing, also inform in slack about it (add non-revertible! at the end).
# Revert in production
If something looks amiss after changing DNS, revert immediately.
- Request a fast revert in #shuttles-stakeholders channel using a general mention such as @here or @channel.
- Send a mail to pem@marfeel.com reporting on impact suffered.
- Revert (opens new window) your commit in Github, so that everyone else can keep working on master safely.
# Insight Shuttles
To Shuttle MarfeelInsight, follow the same Before Shuttling guide, and communicate in the #shuttles-stakeholders slack channel too.
Once the build is finished, the deployment is in one step only:
Go to the GutenbergShuttle Jenkins job (opens new window) and start it for the ins
environment:
Follow the same common recommendations in case of revert.