Improve Performance: Caching Worktree Mood Classification

by Alex Johnson 58 views

In this article, we'll delve into the process of adding caching to the categorizeWorktree function to optimize performance. This function is crucial for mood classification within a system, and by implementing caching, we can significantly reduce unnecessary computations and improve overall efficiency.

Understanding the Need for Caching

To begin, let's understand why caching is essential in this context. The categorizeWorktree function relies on getLastCommitAgeInDays(), which in turn executes git log -1 for each mood classification. Considering that mood is recalculated frequently during polling cycles (every 2-10 seconds), this results in redundant Git operations that could be avoided through caching.

Identifying the Problem

Who is affected? Users with numerous worktrees or rapid polling intervals are most likely to experience performance bottlenecks.

How often? This issue arises during every polling cycle, which occurs every 2 seconds in active mode and 10 seconds in the background for each worktree.

Current behavior: Each invocation of categorizeWorktree triggers a new git log -1 command, leading to unnecessary overhead.

Impact: The repeated spawning of subprocesses and Git operations can strain system resources and slow down performance.

Contextualizing the Mood Classification Flow

Let's examine the mood classification flow to better understand where caching can be applied:

  1. WorktreeMonitor.updateGitStatus() is invoked on each poll cycle, initiating the process.
  2. This, in turn, calls categorizeWorktree(worktree, changes, mainBranch), which is the focal point of our optimization efforts.
  3. Within categorizeWorktree, getLastCommitAgeInDays(worktree.path) is called to determine the age of the last commit.
  4. Finally, getLastCommitAgeInDays executes git log -1 --format=%ci via the simple-git library to retrieve the commit information.

Key Observation

The critical observation here is that the last commit date only changes under specific circumstances:

  • When a new commit is made, resulting in a change to the HEAD.
  • Infrequently, such as once per day, for staleness calculation purposes.

This predictability makes it an ideal candidate for caching, as we can store the commit age and reuse it until a relevant change occurs.

Analyzing the Current State

Before diving into implementing caching, let's examine the existing code:

Relevant Code Snippets

File: src/utils/worktreeMood.ts#L35-L65

export async function categorizeWorktree(
 worktree: Worktree,
 changes: WorktreeChanges | undefined,
 mainBranch: string,
 staleThresholdDays: number = 7
): Promise<WorktreeMood> {
 // ... logic ...

 // This runs git log every time
 const ageDays = await getLastCommitAgeInDays(worktree.path);
 if (ageDays !== null && ageDays > staleThresholdDays) {
 return 'stale';
 }
 // ...
}

This snippet highlights the critical line where getLastCommitAgeInDays is called, triggering the Git operation on each invocation.

File: src/utils/git.ts - getLastCommitAgeInDays function

This file contains the implementation of getLastCommitAgeInDays, which we'll need to modify to incorporate caching.

Proposing Deliverables: Caching Strategies

Now, let's explore two potential caching strategies:

Option A: Cache by (path, HEAD SHA)

This approach involves caching commit ages based on the worktree path and the HEAD SHA (Secure Hash Algorithm). It provides a fine-grained caching mechanism, ensuring that the cache is invalidated whenever a new commit is made.

Code Changes

To implement this strategy, we can add a cache in src/utils/git.ts:

// Cache: Map<worktreePath, { headSha: string, ageDays: number, timestamp: number }>
const commitAgeCache = new Map();

export async function getLastCommitAgeInDays(worktreePath: string): Promise<number | null> {
 const git = simpleGit(worktreePath);
 const headSha = await git.revparse(['HEAD']);

 const cached = commitAgeCache.get(worktreePath);
 if (cached && cached.headSha === headSha) {
 return cached.ageDays;
 }

 // Compute and cache
 const ageDays = await computeCommitAge(git);
 commitAgeCache.set(worktreePath, { headSha, ageDays, timestamp: Date.now() });
 return ageDays;
}

In this implementation:

  • We maintain a commitAgeCache Map to store cached commit ages, keyed by the worktree path.
  • When getLastCommitAgeInDays is called, we first check if the result is cached for the given path and HEAD SHA.
  • If a valid cache entry exists, we return the cached ageDays. Otherwise, we compute the commit age, cache it, and return the result.

Option B: Simple TTL Cache (Simpler)

Alternatively, we can implement a simpler caching mechanism using a Time-To-Live (TTL) approach. This involves caching results for a fixed duration, such as 60 seconds, as staleness is calculated in days anyway.

Code Changes

Here's how we can implement the TTL-based cache:

const commitAgeCache = new Map<string, { ageDays: number, expires: number }>();

export async function getLastCommitAgeInDays(worktreePath: string): Promise<number | null> {
 const cached = commitAgeCache.get(worktreePath);
 if (cached && Date.now() < cached.expires) {
 return cached.ageDays;
 }

 const ageDays = await computeCommitAge(worktreePath);
 commitAgeCache.set(worktreePath, { ageDays, expires: Date.now() + 60000 });
 return ageDays;
}

In this approach:

  • We use a commitAgeCache Map to store cached commit ages along with their expiration timestamps.
  • When getLastCommitAgeInDays is called, we check if a cached entry exists and if it hasn't expired yet.
  • If the cache is valid, we return the cached ageDays. Otherwise, we compute the commit age, cache it with an expiration time, and return the result.

Comparing the Options

Both options have their merits:

  • Option A (Cache by path and HEAD SHA) provides more precise cache invalidation, ensuring that the cache is updated whenever a new commit is made. This is particularly beneficial in scenarios with frequent commits.
  • Option B (Simple TTL cache) is simpler to implement and may be sufficient for scenarios where commit frequency is lower or where a slight delay in cache invalidation is acceptable.

Importance of Testing

Regardless of the chosen caching strategy, thorough testing is crucial to ensure its correctness and effectiveness. Here are some tests we should consider:

  • Test that the cache returns the same value within the TTL (for Option B) or until the HEAD SHA changes (for Option A).
  • Test that the cache invalidates after the TTL expires (for Option B) or when the HEAD changes (for Option A).
  • Verify that mood updates still work correctly after implementing caching.

These tests will help us validate the caching behavior and ensure that it meets our requirements.

Technical Specifications

Let's outline the technical specifications for this caching implementation:

  • Footprint: The changes will primarily affect src/utils/git.ts and potentially src/utils/worktreeMood.ts.
  • Performance:
    • Before caching: 1 Git subprocess per worktree per poll cycle.
    • After caching: 1 Git subprocess per worktree per TTL period (or per commit, depending on the chosen strategy).

This demonstrates the significant performance improvement we can expect from caching.

Tasks Involved

To implement caching, we need to perform the following tasks:

  • Add the caching mechanism to getLastCommitAgeInDays in src/utils/git.ts.
  • Choose an invalidation strategy (HEAD-based or TTL-based) based on the project's needs and constraints.
  • Implement cache invalidation when a worktree is removed to prevent stale data.
  • Add tests to verify the caching behavior, as discussed earlier.
  • Verify that mood classification still works correctly after the changes.

Defining Acceptance Criteria

To ensure the successful implementation of caching, we need to define clear acceptance criteria:

  • git log -1 should not be called on every poll cycle, indicating that caching is effective.
  • The cache should be invalidated appropriately, either based on TTL or HEAD changes.
  • Mood classification should remain accurate after the caching implementation.
  • Tests should verify the caching behavior, covering various scenarios.
  • There should be no memory leaks due to unbounded cache growth, ensuring long-term stability.

Addressing Edge Cases and Risks

Before finalizing the implementation, let's consider potential edge cases and risks:

  • Risk: Stale cache showing incorrect mood. To mitigate this, we can use a reasonable TTL (30-60 seconds) in Option B or rely on HEAD-based invalidation in Option A.
  • Edge case: Worktree removed while cached. We should clear the cache entry on a worktree removal event to prevent issues.
  • Edge case: Fast commits. HEAD-based invalidation in Option A automatically handles this scenario.

By addressing these edge cases and risks, we can ensure a robust caching implementation.

Conclusion

In conclusion, adding caching to the categorizeWorktree function is a crucial step in optimizing performance. By reducing unnecessary Git operations, we can improve the overall efficiency of the system. Whether we choose a HEAD-based or TTL-based caching strategy, thorough testing and consideration of edge cases are essential for a successful implementation. Optimizing performance through caching mechanisms ensures a smoother and more responsive user experience. For further reading on caching strategies and performance optimization, check out Caching Basics on Mozilla Developer Network.