KRUG

How to run 40 000* tests in 10 min

*40702 and counting

I'm Krzysiek

SRE/devops

The talk has two parts: infrastructure and ruby

But I hope you will find this presentation useful if you have less tests

How do we have 40 000+ tests?

We run a matured Rails monolith

A well tested one with over 90% coverage

We use a standard git flow

Imagine you implemented an awesome feature

You make a PR, and wait for tests to finish

CI fails, you make a fix, wait again

Another spec fails. Frustrated?

CI should be fast to reduce developers frustration

How?

The answer is of course parallelization

Without it test suite executes in 21 hours, 17 minutes*

*p95, 4 vCPU, 8 GiB RAM

Let's talk architecture

We use

Buildkite as our CI platform

It gives us flexible pipelines which are just templates, composed of steps.

Steps mostly run shell commands and can be conditional

```yaml steps: - name: ":eslint:" command: "yarn install && yarn eslint" - name: ":rspec:" command: "bundle install && bundle exec rspec --color specs"\ parallelism: 10 ```

Every new commit creates a build, which is being run by agents

Most important feature of Buildkite: it runs on our own AWS infrastructure

We use autoscalling to dynamically setup number of nodes (ec2 instances), on which one or more agents reside.

In Chargify pipelines we rely on Docker extensively

How to run rspec in parallel effectively?

Enter Knapsack gem. Knapsack splits all tests evenly across parallel all agents (152).

Even split is not enough. Some tests are fast, some take forever sometimes

Infrastructure tips & tricks

Experiment with hardware - switching instance type gave us 5% boost for same price

Prepare your base image, you don't need to install node, ruby every single build

Keep your docker-compose loaded dbs in ramdisk ```# docker-compose.yml services: mysql: image: mysql:8.0 command: --default-authentication-plugin=mysql_native_password restart: always environment: - MYSQL_ALLOW_EMPTY_PASSWORD=true volumes: - type: tmpfs target: /var/lib/mysql ```

Minimize layers in docker image Instead of: ``` # Dockerfile RUN cd frontend RUN yarn install --frozen-lockfile --production RUN yarn build RUN yarn cache clean ``` Do: ``` RUN cd frontend && \ yarn install --frozen-lockfile --production && \ yarn build && \ yarn cache clean ```

Use bundle-cache ( https://github.com/sosedoff/bundle_cache ) to store gems zipped in s3. `bundle install` is the most time-consuming phase of our build phase.

That's all good, but we can do more with specs itself

Focus on slowest specs `rspec --profile path/to/specs.rb` might be sufficient

The next step is to check what spec is doing by profiling it, i. e. using `StackProf` It will give you a call-stack, which you can then visualize using flamegraphs.

Or just go with EvilMartians test profiling toolbox ( https://github.com/test-prof/test-prof )

Profiling our spec brought interesting result (and a story I can tell)

Ruby is fast

DBs were not...

but we abused them a bit

Slowest of our specs had a thing in common

They created our whole Chargify pricing in MySQL, which after 10 years on the market grew a bit

Almost 600 records saved to the db

`tapes_helper.rb` was born. It: - records all SQL interactions - dumps them to *.sql file - restores db from dump file instead if file exists Caveat: must be executed before everything else.

Before: ```ruby before(:each) do ChargifyAccount::Plan.bootstrap! end ``` After ```ruby before(:each) do Tapes.use_tape("chargify_account_plan_bootstrap") do ChargifyAccount::Plan.bootstrap! end end ```

YMMV.

Use VCR gem (or similar tool) to record HTTP requests and cut off internet access within your specs.

But update the recordings from time to time

From time to time we test interactions with an live external API. Our tests suite has a conditional flag for it. It can be run demand and is run on schedule every 4 hours.

Disable logging (free 5%) ```ruby config.logger = Logger.new(nil) config.log_level = :fatal ```

Combine slow integration specs Before ```ruby it "assigns site from subdomain to @site" do get :index expect(assigns(:site)).to eq(site) end it "assigns seller from the site in the subdomain to @seller" do get :index expect(assigns(:seller)).to eq(site.seller) end ``` After ```ruby it "assigns site from subdomain to seller and vice versa" do get :index expect(assigns(:site)).to eq(site) expect(assigns(:seller)).to eq(site.seller) end ```

replace `create(:model)` with `build_stubbed(:model)` or `build(:model)` (for FactoryBot users)

Keep your factories cascadeless ```ruby factory :subscription do site seller end factory :site do seller end factory :seller do end create(:subscription) # How many records? ```

``` # 4 records: # subscription # site # seller # seller ```

## Conclusion

It's a thin line to overdo tests speedup. Keep your tests fast but there are limits (and budgets)

### Thanks!