Aggregation & Grouping Interview Questions
Comprehensive aggregation & grouping interview questions and answers for SQL. Prepare for your next job interview with expert guidance.
Questions Overview
1. What is the purpose of the GROUP BY clause in SQL?
Basic2. What are the five basic aggregate functions in SQL?
Basic3. What is the difference between COUNT(*) and COUNT(column_name)?
Basic4. What is the purpose of the HAVING clause?
Basic5. How does DISTINCT work with aggregate functions?
Moderate6. What is the difference between WHERE and HAVING clauses?
Moderate7. How do NULL values affect different aggregate functions?
Moderate8. What is a window function and how does it differ from regular aggregation?
Advanced9. How can you calculate running totals in SQL?
Advanced10. What is the purpose of GROUPING SETS?
Advanced11. How do you handle division by zero in aggregate calculations?
Moderate12. What is the CUBE operator and when would you use it?
Advanced13. How can you find the mode (most frequent value) in SQL?
Moderate14. What is the ROLLUP operator and how does it differ from CUBE?
Advanced15. How do you calculate percentages within groups?
Advanced16. What is the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()?
Advanced17. How can you find groups that have specific patterns or conditions?
Moderate18. What is the purpose of FIRST_VALUE and LAST_VALUE functions?
Advanced19. How do you handle timezone differences in GROUP BY operations with timestamps?
Advanced20. What is the difference between LAG() and LEAD() functions?
Advanced21. How can you identify outliers within groups?
Advanced22. What is the importance of ORDER BY in window functions?
Moderate23. How do you calculate moving averages in SQL?
Advanced24. What is the difference between ROWS and RANGE in window functions?
Advanced25. How do you handle concatenation of values within groups?
Moderate26. What is the purpose of the FILTER clause in aggregate functions?
Advanced27. How do you calculate median values in SQL?
Advanced28. What is the difference between aggregate and analytic functions?
Moderate29. How can you pivot data using aggregate functions?
Advanced1. What is the purpose of the GROUP BY clause in SQL?
BasicThe GROUP BY clause is used to group rows that have the same values in specified columns into summary rows. It is typically used with aggregate functions to perform calculations on each group of rows rather than the entire table.
2. What are the five basic aggregate functions in SQL?
BasicThe five basic aggregate functions in SQL are COUNT(), SUM(), AVG(), MAX(), and MIN(). These functions perform calculations across a set of rows and return a single value.
3. What is the difference between COUNT(*) and COUNT(column_name)?
BasicCOUNT(*) counts all rows including NULL values, while COUNT(column_name) counts only non-NULL values in the specified column. This can lead to different results when the column contains NULL values.
4. What is the purpose of the HAVING clause?
BasicThe HAVING clause is used to filter groups based on aggregate function results. It's similar to WHERE but operates on groups rather than individual rows and can use aggregate functions in its conditions.
5. How does DISTINCT work with aggregate functions?
ModerateWhen DISTINCT is used with aggregate functions (e.g., COUNT(DISTINCT column)), it counts or aggregates only unique values in the specified column, eliminating duplicates before performing the aggregation.
6. What is the difference between WHERE and HAVING clauses?
ModerateWHERE filters individual rows before grouping, while HAVING filters groups after grouping. HAVING can use aggregate functions in its conditions, but WHERE cannot because it processes rows before aggregation occurs.
7. How do NULL values affect different aggregate functions?
ModerateNULL values are handled differently by different aggregate functions: COUNT(*) includes them, COUNT(column) ignores them, SUM and AVG ignore them, and MAX and MIN ignore them. This can significantly impact calculation results.
8. What is a window function and how does it differ from regular aggregation?
AdvancedA window function performs calculations across a set of rows related to the current row, unlike regular aggregation which groups rows into a single output row. Window functions preserve the individual rows while adding aggregate calculations.
9. How can you calculate running totals in SQL?
AdvancedRunning totals can be calculated using window functions with the OVER clause and ORDER BY, such as SUM(value) OVER (ORDER BY date). This creates a cumulative sum while maintaining individual row details.
10. What is the purpose of GROUPING SETS?
AdvancedGROUPING SETS allows you to specify multiple grouping combinations in a single query. It's a shorthand for combining multiple GROUP BY operations with UNION ALL, producing multiple levels of aggregation simultaneously.
11. How do you handle division by zero in aggregate calculations?
ModerateDivision by zero can be handled using NULLIF or CASE statements within aggregate functions. For example, AVG(value/NULLIF(divisor,0)) prevents division by zero errors by converting zero divisors to NULL.
12. What is the CUBE operator and when would you use it?
AdvancedThe CUBE operator generates all possible combinations of grouping columns, producing a cross-tabulation report. It's useful for generating subtotals and grand totals across multiple dimensions in data analysis.
13. How can you find the mode (most frequent value) in SQL?
ModerateThe mode can be found using COUNT and GROUP BY, then selecting the value with the highest count using ORDER BY COUNT(*) DESC and LIMIT 1 or ranking functions like ROW_NUMBER().
14. What is the ROLLUP operator and how does it differ from CUBE?
AdvancedROLLUP generates hierarchical subtotals based on the specified columns' order, while CUBE generates all possible combinations. ROLLUP is used for hierarchical data analysis, creating subtotals for each level.
15. How do you calculate percentages within groups?
AdvancedPercentages within groups can be calculated using window functions, such as SUM(value) OVER (PARTITION BY group) to get the group total, then dividing individual values by this total and multiplying by 100.
16. What is the difference between ROW_NUMBER(), RANK(), and DENSE_RANK()?
AdvancedROW_NUMBER() assigns unique numbers, RANK() assigns same number to ties with gaps, and DENSE_RANK() assigns same number to ties without gaps. They're used for different ranking scenarios within groups.
17. How can you find groups that have specific patterns or conditions?
ModerateGroups with specific patterns can be found using HAVING with aggregate functions to filter groups based on conditions like COUNT(), MIN(), MAX(), or custom calculations that identify the desired patterns.
18. What is the purpose of FIRST_VALUE and LAST_VALUE functions?
AdvancedFIRST_VALUE and LAST_VALUE are window functions that return the first and last values in a window frame, respectively. They're useful for comparing current rows with initial or final values in a group.
19. How do you handle timezone differences in GROUP BY operations with timestamps?
AdvancedTimezone differences can be handled by converting timestamps to a standard timezone using AT TIME ZONE or converting to UTC before grouping. This ensures consistent grouping across different timezones.
20. What is the difference between LAG() and LEAD() functions?
AdvancedLAG() accesses data from previous rows while LEAD() accesses data from subsequent rows in a result set. Both are window functions useful for comparing current rows with offset rows within groups.
21. How can you identify outliers within groups?
AdvancedOutliers can be identified using window functions to calculate statistical measures like standard deviation within groups, then using WHERE or HAVING to filter values that deviate significantly from the group's average.
22. What is the importance of ORDER BY in window functions?
ModerateORDER BY in window functions determines the sequence of rows for operations like running totals, moving averages, and LAG/LEAD functions. It's crucial for time-series analysis and sequential calculations.
23. How do you calculate moving averages in SQL?
AdvancedMoving averages are calculated using window functions with ROWS or RANGE in the OVER clause, such as AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW).
24. What is the difference between ROWS and RANGE in window functions?
AdvancedROWS defines the window frame based on physical row count, while RANGE defines it based on logical value ranges. ROWS is used for fixed-size windows, RANGE for value-based windows.
25. How do you handle concatenation of values within groups?
ModerateGroup concatenation can be achieved using STRING_AGG() or GROUP_CONCAT() (depending on the database system), which combines values from multiple rows into a single string within each group.
26. What is the purpose of the FILTER clause in aggregate functions?
AdvancedThe FILTER clause allows conditional aggregation by specifying which rows to include in the aggregate calculation. It's more readable than CASE expressions and can improve performance.
27. How do you calculate median values in SQL?
AdvancedMedian calculation varies by database system. Common approaches include using PERCENTILE_CONT(0.5), specialized functions like MEDIAN(), or calculating it manually using window functions and row numbers.
28. What is the difference between aggregate and analytic functions?
ModerateAggregate functions group rows into a single result row, while analytic functions (window functions) perform calculations across rows while maintaining individual row details in the result set.
29. How can you pivot data using aggregate functions?
AdvancedData pivoting can be achieved using aggregate functions with CASE expressions or the PIVOT operator (if supported by the database). This transforms row values into columns, creating cross-tabulated results.