Fixing N+1 Query Issue In Dashboard Service

by Alex Johnson 44 views

The Dashboard Service encountered a critical performance bottleneck due to the infamous N+1 query problem, specifically within the subordinates list functionality. This article delves into the issue, its impact, the implemented solution, and the dramatic performance improvements achieved. Let's explore how we tackled this challenge and significantly enhanced our service's efficiency.

Understanding the N+1 Query Problem

At its core, the N+1 query problem arises when an application needs to fetch related data for a set of entities. Imagine retrieving a list of employees and, for each employee, needing to fetch their department, goals, self-assessment, and supervisor feedback. A naive approach might involve first fetching the list of employees (1 query) and then, for each employee (N), executing individual queries to retrieve their related data. This results in 1 initial query plus N additional queries, hence the name N+1. This pattern can lead to a significant performance hit, especially as the number of entities (N) grows. In our case, the N+1 query problem manifested in the _get_subordinates_list() method within backend/app/services/dashboard_service.py, specifically on lines 446-544. The initial implementation triggered four queries for each subordinate, leading to a rapid increase in query count as the number of subordinates grew.

The Impact of the N+1 Query Problem

The consequences of the N+1 query problem can be severe. In our Dashboard Service, the impact was directly proportional to the number of subordinates. Let's illustrate this with a few scenarios:

  • 10 subordinates: 1 + (10 × 4) = 41 queries
  • 50 subordinates: 1 + (50 × 4) = 201 queries
  • 100 subordinates: 1 + (100 × 4) = 401 queries

As you can see, the number of queries skyrockets with the number of subordinates, leading to increased database load and slower response times. Our observations confirmed this, with slow request logs showing response times of up to 3.8 seconds for the GET /api/v1/auth/user/... endpoint. This level of performance degradation is unacceptable for a responsive dashboard service.

Analyzing the Culprit Code

The problematic code resided within the _get_subordinates_list() method. The original implementation looped through each subordinate and executed individual queries to fetch related data. Here's a snippet of the problematic code:

for subordinate in subordinates:
    # Query 1: Department lookup
    if subordinate.department_id:
        department = await self.department_repo.get_by_id(subordinate.department_id, org_id)
    
    # Query 2: Goals lookup
    goals_query = select(Goal).where(...)
    
    # Query 3: Self-assessment lookup
    assessment_query = select(SelfAssessment).where(...)
    
    # Query 4: Feedback lookup
    feedback_query = select(SupervisorFeedback).where(...)

This loop clearly demonstrates the N+1 query problem. For each subordinate, four additional queries were executed: one for the department, one for goals, one for self-assessment, and one for feedback. This inefficient approach was the primary driver of the performance bottleneck.

The Solution: Batch Queries to the Rescue

To address the N+1 query problem, we adopted a strategy of batch queries. Instead of querying for each subordinate's data individually, we restructured the code to fetch all related data in bulk using a minimal number of queries. This approach drastically reduces the number of database interactions and significantly improves performance.

Implementing Batch Querying

The revised _get_subordinates_list() method now employs the following steps:

  1. Fetch Subordinates with Eager Loading: The initial query retrieves all subordinates along with their related data (department, stage, roles) using eager loading. This reduces the initial number of queries.

    subordinates = await self.user_repo.get_subordinates(
        supervisor_id, org_id, with_relations=True  # Loads department, stage, roles
    )
    
  2. Prepare for Batch Fetching: We extract the IDs of all subordinates to be used in subsequent batch queries.

    subordinate_ids = [s.id for s in subordinates]
    
  3. Batch Fetch Related Data: Three batch queries are executed to fetch all goals, self-assessments, and feedbacks for all subordinates in one go. This is the core of the optimization.

    # 1. Batch fetch all goals
    goals_query = select(Goal).where(
        and_(Goal.user_id.in_(subordinate_ids), Goal.period_id == period_id)
    )
    all_goals = (await self.session.execute(goals_query)).scalars().all()
    goals_by_user = defaultdict(list)
    for goal in all_goals:
        goals_by_user[goal.user_id].append(goal)
    
    # 2. Batch fetch all self-assessments
    assessments_query = select(SelfAssessment).where(...)
    all_assessments = (await self.session.execute(assessments_query)).scalars().all()
    assessments_by_user = {a.user_id: a for a in all_assessments}
    
    # 3. Batch fetch all feedbacks
    feedbacks_query = select(SupervisorFeedback).where(...)
    all_feedbacks = (await self.session.execute(feedbacks_query)).scalars().all()
    feedbacks_by_user = defaultdict(list)
    for feedback in all_feedbacks:
        feedbacks_by_user[feedback.employee_id].append(feedback)
    
  4. Process Data Efficiently: Finally, we loop through the subordinates and access their related data from the pre-fetched collections. This eliminates the need for additional queries within the loop.

    for subordinate in subordinates:
        department = subordinate.department  # Already loaded
        goals = goals_by_user[subordinate.id]
        assessment = assessments_by_user.get(subordinate.id)
        feedbacks = feedbacks_by_user[subordinate.id]
        # ...
    

The Optimized Code

Here's the complete solution implemented in Python:

async def _get_subordinates_list(self, supervisor_id: UUID, org_id: str, ...):
    # Get subordinates with eager loading (1 query)
    subordinates = await self.user_repo.get_subordinates(
        supervisor_id, org_id, with_relations=True  # Loads department, stage, roles
    )
    
    subordinate_ids = [s.id for s in subordinates]
    
    # Batch fetch all related data (3 queries total)
    # 1. Batch fetch all goals
    goals_query = select(Goal).where(
        and_(Goal.user_id.in_(subordinate_ids), Goal.period_id == period_id)
    )
    all_goals = (await self.session.execute(goals_query)).scalars().all()
    goals_by_user = defaultdict(list)
    for goal in all_goals:
        goals_by_user[goal.user_id].append(goal)
    
    # 2. Batch fetch all self-assessments
    assessments_query = select(SelfAssessment).where(...)
    all_assessments = (await self.session.execute(assessments_query)).scalars().all()
    assessments_by_user = {a.user_id: a for a in all_assessments}
    
    # 3. Batch fetch all feedbacks
    feedbacks_query = select(SupervisorFeedback).where(...)
    all_feedbacks = (await self.session.execute(feedbacks_query)).scalars().all()
    feedbacks_by_user = defaultdict(list)
    for feedback in all_feedbacks:
        feedbacks_by_user[feedback.employee_id].append(feedback)
    
    # Loop through subordinates using pre-fetched data (no queries in loop!)
    for subordinate in subordinates:
        department = subordinate.department  # Already loaded
        goals = goals_by_user[subordinate.id]
        assessment = assessments_by_user.get(subordinate.id)
        feedbacks = feedbacks_by_user[subordinate.id]
        # ...

Quantifiable Performance Improvements

The results of implementing batch queries were remarkable. We observed a significant reduction in the number of queries and a dramatic improvement in response times.

Query Reduction

  • Before: 1 + (N × 4) queries
  • After: 4-5 queries total
  • Query reduction: 90-99% depending on N

Response Time Improvement

  • Before: 3.8s
  • After: 0.5-0.8s
  • Estimated speedup: 70-85%

Overall Dashboard Impact

For a supervisor with 50 subordinates:

  • Subordinates List: 201 → 4-5 queries (-196 queries)
  • Overall dashboard: ~217 → ~7-8 queries (96% reduction)

These numbers clearly demonstrate the effectiveness of the solution. By switching to batch queries, we achieved a massive reduction in database load and significantly improved the responsiveness of the Dashboard Service.

Ensuring the Fix: Testing and Verification

To ensure the fix was implemented correctly and to validate the performance improvements, we conducted thorough testing. The testing process included the following steps:

  1. Enable Query Logging: We enabled query logging by setting the LOG_LEVEL to DEBUG. This allowed us to monitor the number of queries being executed.
  2. Access Supervisor Dashboard Endpoint: We accessed the supervisor dashboard endpoint to trigger the _get_subordinates_list() method.
  3. Verify Query Count: We compared the query count before and after the fix to confirm the reduction in the number of queries.
  4. Measure Response Time Improvement: We measured the response time before and after the fix to quantify the performance gains.
  5. Monitor Supabase Query Performance Dashboard: We monitored the Supabase Query Performance dashboard to ensure the fix had a positive impact on the overall database performance.

The testing results aligned with our expectations, confirming the significant reduction in query count and the substantial improvement in response time. This gave us confidence that the N+1 query problem had been effectively resolved.

Conclusion: A Major Step Forward

Addressing the N+1 query problem in the Dashboard Service was a critical step towards improving its performance and scalability. By implementing batch queries, we achieved a dramatic reduction in database load and a significant improvement in response times. This not only enhances the user experience but also ensures that the service can handle a growing number of users and data without performance degradation.

This experience underscores the importance of understanding and addressing performance bottlenecks early in the development process. Techniques like batch querying, eager loading, and proper database indexing are crucial for building efficient and scalable applications. By proactively identifying and resolving issues like the N+1 query problem, we can ensure that our services deliver a smooth and responsive experience for our users.

For more information on optimizing database queries and avoiding the N+1 query problem, you can check out resources like the PostgreSQL Wiki on Performance Tips.