Instrumentation

Note

Starting slide 11

  • Software instrumentation: a technique to insert measurement or monitoring mechanisms into software
    • e.g. adding timers around key functions, counting database queries/api calls, measuring resource usage (CPU, memory, …)
    • Measurement: defines what to collect
    • Instrumentation: defines how to collect it
  • Intrusive instrumentation: modifying the original source code by inserting chunks of source code to collect data for analytical purposes (e.g. logging frameworks)
    • problems:
      • need & must understand the source code
      • requires lots of upfront planning to ensure implementation code is implemented as part of the normal system’s implementation
      • requirements define what data needs to be captured
      • design that integrates instrumentation and integrates it into overall system
      • part of the normal dev process not an afterthought
      • hard to remove/modify when you no longer need instrumentation and/or requirements change (…? how is it hard)
    • Unguarded implementation: when you insert instrumentation directly into source code without any restrictions/constraints on when/how it is executed
    • Guarded instrumentation: instrumentation placed directly into source code, but with restrictions/constraints on when/how it is executed
    • Proxy instrumentation: place instrumentation code in a proxy object and proxy calls real implementation
// ... guarded implementation
public void doSomething() {
    if (INSTRUMENT) // <- this is the guard
        log.i("Counter: ", this.counter_);
}
// ...
// ... proxy instrumentation
public SC_Inst(SC sc_impl) { // <- instrumentation object
    this.sc_impl_ = sc_impl;
}
 
public void doSomething() { // <- instrumentation code
    log.i("Counter: " + this.sc_impl_.getCounter());
    this.sc_impl_.doSomething();
}
  • Logs: record of events or messages that occur in a software system
    • Application logs: provides information about the behaviour of the system (e.g. errors, warnings, status updates)
    • System logs: generated by the operating system or infrastructure components, such as log messages from system services, system events, or performance metrics
  • Metrics: numerical value that describes the performance or behaviour of a software system (e.g. response time, system resources, user engagement/conversion rate, …
  • Trace: provides a complete picture of the path taken by a request as it travels through a system used for debugging (helps us understand control flow, timing, interactions)
    • Event (system-level): low level system/application events to analyze performance and behaviour on a single machine
    • Distributed tracing: tracks a request as it moves across multiple services to understand end-to-end flow and latency in distributed systems
  • Problems with instrumentation & data analysis:
    • different tracing tools use incompatible formats, making cross-system analysis hard
    • trace data grows fast and is expensive to store and query
    • large trace datasets are difficult to visualize and interpret
    • traces are hard to link with logs and metrics
    • many tools cannot provide instant feedback
    • hard to track events across multiple nodes or services
  • Distributed tracing: a process of collecting end-to-end transaction graphs in near real time
    • Trace: represents the entire journey of a request
    • Span: represents a single operation call
    • Tags/logs: to annotate the spans with some contextual information (tags apply to the whole span, logs represent some even that happened during the span)
      • a log always has a timestamp that fails within the span’s start-end time interval
  • Monolith architecture: single, tightly coupled application structure (all components in one unit)
    • pros:
      • simple to build, test, deploy on
      • easy to use for small systems
    • cons:
      • difficult to scale or modify as codebase grows
      • a change in one part can break the whole system
      • slow deployment cycles and limited flexibility
      • performance bottlenecks due to shared resources
  • Microservice architecture: application is split into independent, loosely coupled services, each service has its own process and communicates via APIs
    • pros:
      • scale only the parts that need it (scalability)
      • failures are isolated
      • use different languages or technologies (flexibility)
      • update services independently (fast deployment)
      • easy to test and maintain (modularity)

Note

Ended slide 44