Fastest Way to Match Two Tables by Count of Matches
======================================================
In this article, we will explore the fastest way to match two tables based on the count of matches. We will discuss various approaches and techniques to achieve optimal performance.
Background
The problem statement involves matching two tables: CODES_ADDED_UNPACKED and all_campaigns_t_unpacked. The goal is to determine a campaign code for each order in CODES_ADDED_UNPACKED when the campaign code is unknown. We will use a formula that calculates the count of matches between service codes and discount codes.
Table Structures
CREATE TABLE [dbo].[all_campaigns_t_unpacked](
Offername_Full [varchar](255) NULL,
Campaign_Code [varchar](3) NULL,
Service_Code [varchar](5) NULL,
Discount_Code [varchar](5) NULL
) ON [PRIMARY] TEXTIMAGEON [PRIMARY]
GO
CREATE TABLE [dbo].[CODES_ADDED_UNPACKED](
Service_Order_ID [bigint] NOT NULL,
Service_Code [varchar](5) NULL,
Discount_Code [varchar](5) NULL,
Campaign_Code [varchar](5) NULL,
Campaign_Code_Name [varchar](max) NULL
) ON [PRIMARY] TEXTIMAGEON [PRIMARY]
GO
Method 1: Window Function
DROP TABLE IF EXISTS #cte
SELECT A.Service_Order_ID
, Service_Code
, COALESCE(Discount_Code,'BLANK') Discount_Code
, Campaign_Code
, COALESCE(Campaign_Code_Name,'UNKNOWN') Campaign_Code_Name
into #CTE
FROM [dbo].[CODES_ADDED_UNPACKED] A
INNER JOIN
(SELECT DISTINCT TOP 100
Service_Order_ID
FROM
[CODES_ADDED_UNPACKED]) B
ON
A.Service_Order_ID = B.Service_Order_ID
WITH CTE2 AS (
SELECT
TOP 1 WITH TIES
UP.Service_Order_ID
, UP.Campaign_Code_Name
, UP.Campaign_Code CAMPAIGN_CODE
, CC.CAMPAIGN_CODE CAMPAIGN_CODE_CHECKED
FROM
#CTE UP
INNER JOIN
all_campaigns_t_unpacked CC
ON
UP.BILLING_CD = CC.Service_Code
AND UP.ITEM_DISCOUNT_CD = CC.DISCOUNT_CODE
GROUP BY
UP.Service_Order_ID
, UP.Campaign_Code_Name
, UP.Campaign_Code
, CC.CAMPAIGN_CODE
ORDER BY ROW_NUMBER() OVER (PARTITION BY UP.Service_Order_ID ORDER BY 10 * sum(case when UP.ITEM_DISCOUNT_CD = 'BLANK' then 0 else 1 end) + count(*) desc)
)
SELECT * FROM CTE2 ORDER BY Service_Order_ID
Method 2: Cross Apply
DROP TABLE IF EXISTS #cte
SELECT A.Service_Order_ID
, Service_Code
, COALESCE(Discount_Code,'BLANK') Discount_Code
, Campaign_Code
, COALESCE(Campaign_Code_Name,'UNKNOWN') Campaign_Code_Name
into #CTE
FROM [dbo].[CODES_ADDED_UNPACKED] A
INNER JOIN
(SELECT DISTINCT TOP 100
Service_Order_ID
FROM
[CODES_ADDED_UNPACKED]) B
ON
A.Service_Order_ID = B.Service_Order_ID
SELECT C.*
FROM (SELECT DISTINCT Service_Order_ID FROM #CTE) A
CROSS APPLY (
SELECT TOP 1
Service_Order_ID
, Campaign_Code_Name
, CAMPAIGN_CODE
, CAMPAIGN_CODE_CHECKED
FROM
(
SELECT
UP.Service_Order_ID
, UP.Campaign_Code_Name
, UP.Campaign_Code CAMPAIGN_CODE
, CC.CAMPAIGN_CODE CAMPAIGN_CODE_CHECKED
, UP.ITEM_DISCOUNT_CD
FROM #CTE UP
CROSS APPLY (
SELECT *
FROM all_campaigns_t_unpacked C
WHERE
UP.BILLING_CD = C.Service_Code
AND UP.ITEM_DISCOUNT_CD = C.DISCOUNT_CODE
) CC
) B
WHERE
A.Service_Order_ID = B.Service_Order_ID
GROUP BY
Service_Order_ID
, Campaign_Code_Name
, CAMPAIGN_CODE
, CAMPAIGN_CODE_CHECKED
ORDER BY 10 * sum(case when ITEM_DISCOUNT_CD ='BLANK' then 0 else 1 end) + count(*) desc
) C
ORDER BY C.Service_Order_ID
Indexes and Statistics
The problem statement mentions indexes on CODES_ADDED_UNPACKED and all_campaigns_t_unpacked. We will also discuss the importance of statistics.
CREATE INDEX IX_SERVICE_ORDER_ID_CAMPAIGN ON [CODES_ADDED_UNPACKED] (Service_Order_ID, Campaign_Code)
CREATE INDEX IX_SERVICE_CODE_DISCOUNT_CODE ON [all_campaigns_t_unpacked] (Service_Code, Discount_Code)
CREATE STATISTIC CODES_ADDED_STATS ON [CODES_ADDED_UNPACKED]
(SERVICE_ORDER_ID, CAMPAIGN_CD_NM, CAMPAIGN_CD)
CREATE STATISTIC all_campaigns_t_unpacked_STATS ON [all_campaigns_t_unpacked]
(CAMPAIGN_CODE)
Optimized Solution
Based on the analysis, we will present an optimized solution that incorporates the following techniques:
- Batch up rows of
CODES_ADDED_UNPACKEDinto a temporary table with clustered index. - Create a clustered index on
Service_CodeandDiscount_Codeinall_campaigns_t_unpacked. - Use a merge join between the temporary table and
all_campaigns_t_unpacked. - Group by the formula and insert to another temporary table, which is indexed by
OrderId.
-- Batch up rows of CODES_ADDED_UNPACKED into a temporary table with clustered index.
CREATE TABLE #CODES_ADDED_TEMP (
Service_Order_ID [bigint] NOT NULL,
Service_Code [varchar](5) NOT NULL,
Discount_Code [varchar](5) NOT NULL,
Campaign_Code [varchar](5) NOT NULL,
Campaign_Code_Name [varchar](max) NOT NULL
) ON [PRIMARY] CLUSTERED INDEX (Service_Code, Discount_Code)
-- Create a clustered index on Service_Code and Discount_Code in all_campaigns_t_unpacked.
CREATE CLUSTERED INDEX IX_SERVICE_CODE_DISCOUNT_CODE ON [all_campaigns_t_unpacked] (Service_Code, Discount_Code)
-- Use a merge join between the temporary table and all_campaigns_t_unpacked.
SELECT *
FROM #CODES_ADDED_TEMP
MERGE INTO all_campaigns_t_unpacked AS T
USING (
SELECT TOP 1 Service_Order_ID
, COALESCE(Campaign_Code_Name,'UNKNOWN') Campaign_Code_Name
, COALESCE(Campaign_Code, 'Unknown') Campaign_Code
FROM #CODES_ADDED_TEMP
) AS S
ON (T.Service_Code = S.Service_Code AND T.Discount_Code = S.Discount_Code)
WHEN MATCHED THEN UPDATE SET T.Campaign_Code = S.Campaign_Code, T.Campaign_Code_Name = S.Campaign_Code_Name
WHEN NOT MATCHED THEN INSERT (Service_Order_ID, Service_Code, Discount_Code, Campaign_Code, Campaign_Code_Name)
-- Group by the formula and insert to another temporary table, which is indexed by OrderId.
CREATE TABLE #ORDER_TEMP (
Order_Id [bigint] NOT NULL,
Campaign_Code [varchar](5) NOT NULL
) ON [PRIMARY] CLUSTERED INDEX (Order_Id)
INSERT INTO #ORDER_TEMP (OrderId, Campaign_Code)
SELECT DISTINCT S.Service_Order_ID
, COALESCE(T.Campaign_Code, 'Unknown') Campaign_Code
FROM #CODES_ADDED_TEMP AS S
CROSS APPLY (
SELECT TOP 1
Service_Order_ID
, Campaign_Code_Name
, CAMPAIGN_CODE
, CAMPAIGN_CODE_CHECKED
FROM
all_campaigns_t_unpacked C
WHERE
S.Service_Code = C.Service_Code AND S.Discount_Code = C.DISCOUNT_CODE
) AS T
-- Update the Order table with the campaign code.
UPDATE O
SET O.Campaign_Code = OT.Campaign_Code
FROM [Order] O
INNER JOIN #ORDER_TEMP OT ON O.Order_Id = OT.Order_Id
Conclusion
In conclusion, we have presented an optimized solution for matching two tables based on the count of matches. The approach incorporates batched up rows into temporary tables with clustered indexes, merge joins, and grouping by formulas to achieve optimal performance.
Note that the actual implementation may vary depending on the specific requirements of your database and application.
Last modified on 2024-05-23