Instrumentation
Note
Starting slide 11
- Software instrumentation: a technique to insert measurement or monitoring mechanisms into software
- e.g. adding timers around key functions, counting database queries/api calls, measuring resource usage (CPU, memory, …)
- Measurement: defines what to collect
- Instrumentation: defines how to collect it
- Intrusive instrumentation: modifying the original source code by inserting chunks of source code to collect data for analytical purposes (e.g. logging frameworks)
- problems:
- need & must understand the source code
- requires lots of upfront planning to ensure implementation code is implemented as part of the normal system’s implementation
- requirements define what data needs to be captured
- design that integrates instrumentation and integrates it into overall system
- part of the normal dev process → not an afterthought
- hard to remove/modify when you no longer need instrumentation and/or requirements change (…? how is it hard)
- Unguarded implementation: when you insert instrumentation directly into source code without any restrictions/constraints on when/how it is executed
- Guarded instrumentation: instrumentation placed directly into source code, but with restrictions/constraints on when/how it is executed
- Proxy instrumentation: place instrumentation code in a proxy object and proxy calls real implementation
- problems:
// ... guarded implementation
public void doSomething() {
if (INSTRUMENT) // <- this is the guard
log.i("Counter: ", this.counter_);
}
// ...
// ... proxy instrumentation
public SC_Inst(SC sc_impl) { // <- instrumentation object
this.sc_impl_ = sc_impl;
}
public void doSomething() { // <- instrumentation code
log.i("Counter: " + this.sc_impl_.getCounter());
this.sc_impl_.doSomething();
}
- Logs: record of events or messages that occur in a software system
- Application logs: provides information about the behaviour of the system (e.g. errors, warnings, status updates)
- System logs: generated by the operating system or infrastructure components, such as log messages from system services, system events, or performance metrics
- Metrics: numerical value that describes the performance or behaviour of a software system (e.g. response time, system resources, user engagement/conversion rate, …
- Trace: provides a complete picture of the path taken by a request as it travels through a system → used for debugging (helps us understand control flow, timing, interactions)
- Event (system-level): low level system/application events to analyze performance and behaviour on a single machine
- Distributed tracing: tracks a request as it moves across multiple services to understand end-to-end flow and latency in distributed systems
- Problems with instrumentation & data analysis:
- different tracing tools use incompatible formats, making cross-system analysis hard
- trace data grows fast and is expensive to store and query
- large trace datasets are difficult to visualize and interpret
- traces are hard to link with logs and metrics
- many tools cannot provide instant feedback
- hard to track events across multiple nodes or services
- Distributed tracing: a process of collecting end-to-end transaction graphs in near real time
- Trace: represents the entire journey of a request
- Span: represents a single operation call
- Tags/logs: to annotate the spans with some contextual information (tags apply to the whole span, logs represent some even that happened during the span)
- a log always has a timestamp that fails within the span’s start-end time interval
- Monolith architecture: single, tightly coupled application structure (all components in one unit)
- pros:
- simple to build, test, deploy on
- easy to use for small systems
- cons:
- difficult to scale or modify as codebase grows
- a change in one part can break the whole system
- slow deployment cycles and limited flexibility
- performance bottlenecks due to shared resources
- pros:
- Microservice architecture: application is split into independent, loosely coupled services, each service has its own process and communicates via APIs
- pros:
- scale only the parts that need it (scalability)
- failures are isolated
- use different languages or technologies (flexibility)
- update services independently (fast deployment)
- easy to test and maintain (modularity)
- pros:
Note
Ended slide 44