Recently I’ve been working in Google Cloud Platform (GCP) in a highly distributed system that is partitioned into many different services with various deployment strategies. A lot of it runs in Google Kubernetes Engine (GKE), a lot of it runs in Google Cloud Functions (GCF), and a lot of it runs in other bits and pieces of GCP infrastructure. Even the areas that are managed via GKE and Cloud Functions are heterogenous, with many idiosyncratic infrastructure configurations.
This has made digging into the system as a whole quite difficult, as it can be tricky even to know about the existence of a service, let alone where to start looking for logging and other insights into its operation.
One tactic that has been quite effective so far is taking advantage of the query syntax in the GCP Logging tool. This is quite a full-fledged query syntax that lets you explore large amounts of logging data to try and identify what might be useful for understanding a particular service.
A good starting point might be to try and track down logs for a particular GKE container or a particular Cloud Function, which would look like one of these:
Make use of the auto-complete as it might suggest a searchable attribute that you weren’t previously aware of.
Moving on: the single most useful tool is the regex operator. The basic equality operator is OK when you know the exact names you’re looking for, but as that is often not the case on sprawling projects that you are unfamiliar with, you need something more flexible. The regex operator lets you make much wider-reaching log queries.
This will find log entries whose container name contains
This way you can try to find logs for containers or functions whose exact name you can only guess at.
You can combine this with the OR operator for even wider exploratory searches:
resource.labels.container_name=~"foobar" OR resource.labels.function_name=~"foobar"
Note that you can use logical parentheses to do arbitrary boolean querying:
(resource.labels.container_name=~"foobar" OR resource.labels.function_name=~"foobar") AND "baz"
Another useful element of the syntax is negative exclusions, i.e. the NOT
operator. This is a
- before the query term, e.g.
This will exclude any log entry whose container name includes
You can start with a wide explaratory query and then narrow it down by adding negative exclusions for each irrelevant thing you see in the search results. This can be a good tactic for navigating your way to some specific logs that might be useful for your investigations.