In my last post, we discussed the different forms of instrumentation and how dynamic instrumentation can be useful. Let’s investigate how we might use debuggers to dynamically instrument live systems.
We will be using the delve debugger in this post. I recommend getting it installed and perhaps running through the getting started guide. If you have prior experience with delve or other debuggers this won’t be necessary.
Debuggers are awesome! This is how I feel when I start a debugger:
Most of the functionality of a debugger is implemented by the ptrace syscall. This syscall allows a “tracer” process (the debugger) to control a “tracee” process (your go program). You can intercept the tracee’s signals and syscalls, read and write to its registers and memory and single step through the program.
Since the debugger can write to the memory of the tracee, this allows it to set breakpoints that will suspend execution of the tracee once they are hit. We can then inspect the state of the tracee, allowing us to add arbitrary insturmentation logic. It is super interesting how this works. In order to set a breakpoint, the debugger:
When you tell the debugger to continue it:
So cool! And you thought debuggers were magic!
This animation demonstrates the overall flow of what happens when you attach and hit a breakpoint:
As you can see, when the debugger is running, your code is suspended. That is important to keep in mind.
The major problem with using debuggers to add instrumentation to a running process is suspended execution. In a production environment, you might not be able to stop the process to inspect its state. Additionally, debuggers are normally blocked on user input, so interacting with your process in this way will prevent your process from running and doing useful work.
We can try to minimize the time we are in a suspended state by automating the debugger. Delve exposes a JSON-RPC API that we can use to drive it. They even provide their own Go client. So helpful! This is a technique I’ve used in the past and it works surprisingly well for a lot of use cases.
Let’s create a test program that requires dynamic instrumentation. We can then write a program that automates delve to add the instrumentation we want.
Note: The full source code for this experiment can be found here.
Our test program is just a hot loop that will increment a counter:
var counter int64
//go:noinline
func doWork() {
counter++
}
func main() {
fmt.Println("This is a blackbox. Read my counter.")
for {
doWork()
}
}
The function we are instrumenting is trivial so we add the noinline
pragma
to prevent inlining. In real-world situations, this likely isn’t necessary and
if your function was indeed inlined there are ways of working around this.
Let’s add instrumentation to periodically read this global counter.
First, we need to launch delve in a headless mode and attach to our process:
dlv := exec.Command(
"dlv",
"attach",
strconv.Itoa(pid),
"--headless",
"--accept-multiclient",
"--api-version=2",
"--listen="+addr,
)
dlv.Start()
This starts up delve. Delve then attaches to the process, suspends its execution, and starts a listener on the address we provided. This will allow us to communicate with delve and send it API commands.
Next, we need to wait for the listener to be up:
waitToBeUp(addr)
This function attempts to dial the address multiple times. If it hasn’t come up in 5 seconds it will exit with an error:
func waitToBeUp(addr string) {
done := time.After(5 * time.Second)
for {
conn, err := net.DialTimeout("tcp", addr, 2*time.Second)
if err == nil {
conn.Close()
return
}
select {
case <-done:
log.Fatalf("Stuck waiting for %q to be up: %s", addr, err)
default:
}
}
}
Now that we can connect to the debugger we can create our client:
client := rpc2.NewClient(addr)
client.Continue()
defer func() {
client.Halt()
client.Detach(false)
}()
The first thing the client needs to do is to tell the debugger to continue the process. This makes sure it isn’t suspended for longer than is needed durring this setup phase. We then defer some cleanup routines to detach the debugger safely from the process when our controller exits.
We now need to find what instruction to instrument. We know our function name
is doWork
. All symbol names in Go binaries are fully qualified so the symbol
name for doWork
is main.doWork
. We can ask delve to resolve this
information for us:
locations, _ := client.FindLocation(api.EvalScope{
GoroutineID: -1,
}, "main.doWork")
pc := locations[0].PC
Now that we know the address of our function we can create our breakpoint:
bp := &api.Breakpoint{
Name: "myAwesomeBreakpoint",
Addr: pc,
Variables: []string{"counter"},
}
Along with the memory address of our function, this contains a friendly name and a variable we would like to read when the breakpoint is hit.
We are now ready to start making some measurements. Here is the code to do so:
for {
client.Halt()
client.CreateBreakpoint(bp)
state := <-client.Continue()
client.ClearBreakpointByName(bp.Name)
client.Continue()
counter := state.CurrentThread.BreakpointInfo.Variables[0].Value
fmt.Printf("counter: %s", counter)
time.Sleep(time.Second)
}
Let’s break this down. First, we tell delve to halt the process, this is needed
to run commands like CreateBreakpoint
which will write the trap instruction.
We then call Continue
which gives us a channel. This channel will block until
the breakpoint is hit. Once it is hit we can grab the debugger state, clear
the breakpoint, and continue again. This avoids the breakpoint firing on every
invocation of doWork
. We can now process the counter
value and sleep.
Let’s run this and see what happens!
On the left you see the counter program running. I added some additional instrumentation to output the counter state and increment rate. On the right is our controller process that spawns delve You can see that when instrumentation is added the rate is barely affected.
Pretty cool stuff! Unfortunately, this technique only works for processing
events at a low frequency. In this situation, we are hitting a breakpoint only
once every second. What if we tried to sample every invocation of doWork
?
Ouch! That is quite the performance hit! There are a number of reasons this has such a large impact on performance. There is a lot of cost in context switching between the process, kernel, and debugger. Additionally, the time the process is suspended is small but it adds up to significant overhead when the breakpoints are firing frequently.
In the next post, we will investigate using uprobes and BPF to instrument events that occur at a higher frequency!