What I learned from setting up CI for a COVID-19 Exposure Notification Open Source project

Stephanie Wang
Google Cloud - Community
8 min readApr 23, 2021

--

The COVID-19 Exposure Notifications System went live earlier last year during the coronavirus outbreak. Google and Apple jointly created and released interoperable systems to help public health authorities to notify and give guidance to users of their apps who may have been exposed to someone who has contracted COVID-19. As part of another project that provides privacy preserving analytics to health authorities, I got involved to help with the setup of continuous integration (CI) for the ENPA Google repo. Our project team comprises developers from different orgs and backgrounds including Android, Google Cloud, Security, etc. The ENPA project contains the Google ingestor pipeline code, where the CI helps to ensure code quality during code development. Here are some lessons that I learned during this process.

Understanding the goals and non-goals

The general goal communicated to me was pretty straight-forward: set up CI to run tests when each pull request is opened so that any test failures shall prevent the pull request from being merged into the main branch of the codebase. However, a couple questions immediately popped into my head: What kind of tests are these? Do they require secrets to run? Are there other things such as linter or code formatter that we should run as well if we want to maintain code quality?

To answer these questions and clarify requirements, I set up a meeting with the project leads and scoped out our goals and non-goals so that I could decide on the CI jobs to run and how to run them. Upon our discussion, it became clear that our goals were to:

  1. ensure code quality when developers contribute to the repo;
  2. enforce code quality by leveraging systematic CI.

We don’t need to:

  1. publish artifacts to Maven Central from our repo;
  2. run tests using multiple Java versions such as Java 7 and 8;
  3. run the “light” versions of tests on presubmit and the comprehensive test suite nightly; although this is something we might consider doing in the future.

These goals and non-goals helped me prioritize and understand that the CI platform I choose to use needs to be accommodating enough for all these present and potential requirements but does not need to be overly complex. I needed to deliver CI to, most critically, run unit and integration tests in Java 11 when a pull request is made. The setup needs to be able to support continuous jobs when needed. On top of that, we can set jobs up to run license header check, linter and dependency checks to further ensure code quality.

Through defining goals and non-goals up-front, I was able to understand what was needed and. more importantly, what was not needed for the CI setup. This allowed me to deliver an impactful solution quickly.

Deciding on a suitable CI platform: be future-proof but not overarching

Based on our goals, I needed to select the appropriate CI tools to build our pipelines. For unit tests, linter, dependency checks, and code formatter which do not require service accounts and testing secrets to run, I decided on Github Actions since from past experience, Github Actions are easy to configure. They are also fast and free. I had migrated all the googleapis repos’ presubmit unit tests jobs to Github Actions prior to working on ENPA.

Github Actions in BigQuery Java client library repo (https://github.com/googleapis/java-bigquery)

However, for integration tests, I needed to do more research. Most GCP Java client libraries familiar to me run integration tests on Google’s Kokoro. For example, the GCP BigQuery Java client library has three types of integration tests: continuous, nightly and presubmit which run with respective triggers. In addition, we also have a release job which handles the releasing of GCP google-cloud-* artifacts to Maven Central so that GCP customers can access the client libraries and use them to interact with GCP APIs.

google-cloud-bigquery artifact released in Maven Central

The benefit of using Kokoro is that we can safely store service accounts and testing secrets internally. It also allows us to securely release google-cloud-* artifacts to Maven Central by providing us with OSS Sonatype signing keys at the time of release. One major downside of Kokoro is the complexity of the setup. We need to configure jobs internally in the Google codebase, define build configurations in Github and then configure webhooks to connect the internal job to the external build configuration. The setup gets even more complicated and slow when the repo is private, which was the case for us during the majority of developing the first versions that went to production. The ENPA Google Ingestor repo, on the other hand, does not need to release artifacts to Maven Central. Thus, removing the need for us to prepare OSS signing keys. With all the requirements and effort necessary to set up Kokoro, we decided against it.

The Kokoro job for integration tests in googleapis repos

We investigated other platforms and finally decided on Cloud Build. Cloud Build can build and test across different programming languages with customized workflows. This is particularly important to a project like ENPA where developers come from different teams and backgrounds. We need to cater to the diversity of skills of developers involved who would pick tools and languages of their choice. Some integration tests are run with `mvn` while others are run with `npm`. The platform needs to accommodate different testing environments. It also needs to be able to run different types of integration tests depending on the event trigger. For instance, we may want to run the test that processes 100k data shares only nightly but the test that processes 10k data shares on each pull request for faster turnaround and less data processing overhead.

The Cloud Build job for integration tests in the google/exposure-notifications-private-analytics-ingestion repo

Picking a suitable CI platform not only helped us accomplish the task at hand successfully but also helped us save valuable engineering hours.

Thinking about testing infrastructure and the bigger picture

Once I decided on Cloud Build, I quickly jumped into implementation. I created a Cloud Build pipeline in our testing project and permissioned the default Cloud Build service account to access our testing secrets in order to run the integration tests.

This was immediately rejected during code review. Turns out, I was unaware of the limitation that there could be only one Firebase project per GCP project and we could be running integration tests against different Firebase projects in the future. This means that my initial solution would require to recreate a new CI pipeline for each new GCP testing project!

A better approach would be to separate the CI pipeline from the testing projects. Put the CI pipeline in a dedicated infrastructure project; permission its service account to run integration tests in various testing projects. This allows for much more flexibility and requires a lot less work to configure. Centralizing the build and test configurations also means easier maintenance in the long run.

Thinking about the big picture and the infrastructure at large helps make more sound decisions on how the setup should be so that we can maintain it more easily in the longer term.

Now that we have a Minimum Viable Product (MVP) that satisfies our basic requirements, we can start running it and see if there are any potential improvements we can make to the CI pipeline.

Applying general software engineering principles and best practices

Once we started running our new CI pipeline, we started to observe some areas of improvement.

For instance, our initial CI pipeline ran `./mvnw clean verify` for integration tests each time. This means that we were downloading Maven packages every time we ran integration tests since, by default, Cloud Build starts containers with empty local maven repositories. As an improvement, we started caching these packages in Google Cloud Storage so that our integration tests would take three minute instead of seven minutes to run. There are certainly benefits of running with a blank slate each time but caching these packages locally significantly sped up our CI:

We are also using a Java 11 gcr container to run our npm firebase emulator tests since the firebase emulator requires Java to run but this also means that we are installing npm each time we run the tests. To address it, we could start building or fetching a gcr image that has both Java and npm. However, by doing this, we would need to take on the responsibility of maintaining the new container.

None of these improvements should block the rollout of our CI pipeline. It is good if you already know how to quickly get some of these to work, but otherwise, configuring and deploying a working CI pipeline quickly and then improving upon it iteratively has so much more impact than waiting to roll it out.

As a general rule of thumb, try to create a working CI pipeline before making improvements or upgrades. Fail fast, fail often to iterate quickly. It is enticing to get an efficient and powerful CI pipeline configured before putting it to test but the reality is that CI is indispensable in code-committing. It is necessary to have something running to ensure that no committed code is breaking the existing codebase. Getting CI up and running is critical for any project.

Final thoughts

No part of DevOps is simple. I find it much less painful after having exhaustive discussions with other software engineers and stakeholders involved to focus on what is important. Nothing is worse than having set up everything that works and then having to modify it completely — the credential and permissions management alone can be mind-boggling. So always begin with understanding the goals and non-goals. I find this true in software engineering and now also in DevOps. If you have any questions, feel free to message me on Twitter.

Special thanks to Robert Kubis for his guidance and help with this project and blogpost.

--

--