Post

CASE/WHEN inside SQL JOIN conditions

To start the post, I’d like to present the task I had to complete at my job in order to write the sample code I’m about to show here. Also, I’d like to emphasize that no real data is being used through this post.

Imagine we have the following table OBJECT_COSTS:

REFERENCECost
CUO0001_TY001_001$100
CUO0001_TY002_001$200
CUO0001_TY003_001$730
CUO0002_TY001_001$650
CUO0002_TY002_001$230
CUO0002_TY003_001$190
CUO0003_TY001_001$190
CUO0003_TY001_002$220
CUO0003_TY001_003$500
NA_NA_NA$850
CUO0001_NA_NA$500
NA_TY003_NA$100
NA_NA_002$800
CUO0003_TY001_NA$710
CUO0002_NA_001$280

Notice how some of the objects’ references have “NA” (Not Available) as part of their structure. We will call them “None Objects”. What we need to do here is to create a table that associates all the None Objects to the others such that their cost will be realocated to the “Complete Objects”.

Diving into the problem

So now it’s important to understand what the criteria is to realocate the costs of these None Objects. The idea is simple, we need to associate a None Object to all the Complete Objects whose structures match with the rest of the None Object structure that is available.

Let’s use an example to make it easier to digest. The object CUO0002_NA_001 has to realocate its costs to CUO0002_TY001_001, CUO0002_TY002_001 and CUO0002_TY003_001. Our assignments table needs to have the following lines:

SOURCE_REFERENCETARGET_REFERENCE
CUO0002_NA_001CUO0002_TY001_001
CUO0002_NA_001CUO0002_TY002_001
CUO0002_NA_001CUO0002_TY003_001

The question is, considering we have a huge OBJECT_COSTS table with millions of objects and possibly more levels on their reference structure than only 3 as shown in this sample example, how do we write an SQL code that can build this assignment table for us?

Splitting the references into levels

The code is actually quite simple, but the ideia behind it might not be obvious for everyone. For the first step, we will break all the objects references into levels to make the next steps easier. The following code is based on MySQL server:

1
2
3
4
5
6
7
8
9
DROP TABLE IF EXISTS OBJECT_COSTS_LEVELS;
CREATE TEMPORARY TABLE OBJECT_COSTS_LEVELS;

INSERT INTO OBJECT_COSTS_LEVELS
SELECT *,
	Substring_Index(REFERENCE, '_', 1) AS LV1,
    Substring_Index(Substring_Index(REFERENCE, '_', 2), '_', -1) AS LV2,
    Substring_Index(Substring_Index(REFERENCE, '_', 3), '_', -1) AS LV3
FROM OBJECT_COSTS;

That gives us the following table:

REFERENCECostLV1LV2LV3
CUO0001_TY001_001$100CUO0001TY001001
CUO0001_TY002_001$200CUO0001TY002001
CUO0001_TY003_001$730CUO0001TY003001
CUO0002_TY001_001$650CUO0002TY001001
CUO0002_TY002_001$230CUO0002TY002001
CUO0002_TY003_001$190CUO0002TY003001
CUO0003_TY001_001$190CUO0003TY001001
CUO0003_TY001_002$220CUO0003TY001002
CUO0003_TY001_003$500CUO0003TY001003
NA_NA_NA$850NANANA
CUO0001_NA_NA$500CUO0001NANA
NA_TY003_NA$100NATY003NA
NA_NA_002$800NANA002
CUO0003_TY001_NA$710CUO0003TY001NA
CUO0002_NA_001$280CUO0002NA001

Separating the Complete Objects

Now we can create a COMPLETE_OBJECT_COSTS_LEVELS with only the complete objects:

1
2
3
4
5
6
DROP TABLE IF EXISTS COMPLETE_OBJECT_COSTS_LEVELS;
CREATE TEMPORARY TABLE COMPLETE_OBJECT_COSTS_LEVELS;

INSERT INTO COMPLETE_OBJECT_COSTS_LEVELS
SELECT * FROM OBJECT_COSTS_LEVELS
WHERE LV1 != 'NA' AND LV2 != 'NA' AND LV3 != 'NA';

Results in the following table:

REFERENCECostLV1LV2LV3
CUO0001_TY001_001$100CUO0001TY001001
CUO0001_TY002_001$200CUO0001TY002001
CUO0001_TY003_001$730CUO0001TY003001
CUO0002_TY001_001$650CUO0002TY001001
CUO0002_TY002_001$230CUO0002TY002001
CUO0002_TY003_001$190CUO0002TY003001
CUO0003_TY001_001$190CUO0003TY001001
CUO0003_TY001_002$220CUO0003TY001002
CUO0003_TY001_003$500CUO0003TY001003

Creating the Assignments

We can now create the assignments between the OBJECT_COSTS_LEVELS and COMPLETE_OBJECT_COSTS_LEVELS by using multiple CASE/WHEN statements inside an SQL JOIN query:

1
2
3
4
5
6
7
8
9
10
11
SELECT
    T1.REFERENCE AS SOURCE_REFERENCE,
    T2.REFERENCE AS TARGET_REEFRENCE
FROM
    OBJECT_COSTS_LEVELS AS T1
        JOIN
    COMPLETE_OBJECT_COSTS_LEVELS AS T2
        ON
    CASE WHEN T1.LV1 = 'NA' THEN T2.LV1 ELSE T1.LV1 END = T2.LV1 AND
    CASE WHEN T1.LV2 = 'NA' THEN T2.LV2 ELSE T1.LV2 END = T2.LV2 AND
    CASE WHEN T1.LV3 = 'NA' THEN T2.LV3 ELSE T1.LV3 END = T2.LV3

By doing this, whenever the level is ‘NA’, then we force the match at that level. If not, then we compare T1.LVX and T2.LVX normally. We do this at every level and make sure all the conditions are true to create the assignments via JOIN. The result follows like so:

SOURCE_REFERENCETARGET_REEFRENCE
CUO0001_TY001_001CUO0001_TY001_001
CUO0001_TY002_001CUO0001_TY002_001
CUO0001_TY003_001CUO0001_TY003_001
CUO0002_TY001_001CUO0002_TY001_001
CUO0002_TY002_001CUO0002_TY002_001
CUO0002_TY003_001CUO0002_TY003_001
CUO0003_TY001_001CUO0003_TY001_001
CUO0003_TY001_002CUO0003_TY001_002
CUO0003_TY001_003CUO0003_TY001_003
NA_NA_NACUO0003_TY001_003
NA_NA_NACUO0003_TY001_002
NA_NA_NACUO0003_TY001_001
NA_NA_NACUO0002_TY003_001
NA_NA_NACUO0002_TY002_001
NA_NA_NACUO0002_TY001_001
NA_NA_NACUO0001_TY003_001
NA_NA_NACUO0001_TY002_001
NA_NA_NACUO0001_TY001_001
CUO0001_NA_NACUO0001_TY003_001
CUO0001_NA_NACUO0001_TY002_001
CUO0001_NA_NACUO0001_TY001_001
NA_TY003_NACUO0002_TY003_001
NA_TY003_NACUO0001_TY003_001
NA_NA_002CUO0003_TY001_002
CUO0003_TY001_NACUO0003_TY001_003
CUO0003_TY001_NACUO0003_TY001_002
CUO0003_TY001_NACUO0003_TY001_001
CUO0002_NA_001CUO0002_TY003_001
CUO0002_NA_001CUO0002_TY002_001
CUO0002_NA_001CUO0002_TY001_001

Final Alocated Cost computation

For simplicity, we didn’t bring the costs of them objects in previous section, bu modifying the columns selected we can bring them to our output. We will call the following table OBJECT_COST_ASSIGNMENTS:

SOURCE_REFERENCECOSTTARGET_REFERENCETARGET_COST
CUO0001_TY001_001100CUO0001_TY001_001100
CUO0001_TY002_001200CUO0001_TY002_001200
CUO0001_TY003_001730CUO0001_TY003_001730
CUO0002_TY001_001650CUO0002_TY001_001650
CUO0002_TY002_001230CUO0002_TY002_001230
CUO0002_TY003_001190CUO0002_TY003_001190
CUO0003_TY001_001190CUO0003_TY001_001190
CUO0003_TY001_002220CUO0003_TY001_002220
CUO0003_TY001_003500CUO0003_TY001_003500
NA_NA_NA850CUO0003_TY001_003500
NA_NA_NA850CUO0003_TY001_002220
NA_NA_NA850CUO0003_TY001_001190
NA_NA_NA850CUO0002_TY003_001190
NA_NA_NA850CUO0002_TY002_001230
NA_NA_NA850CUO0002_TY001_001650
NA_NA_NA850CUO0001_TY003_001730
NA_NA_NA850CUO0001_TY002_001200
NA_NA_NA850CUO0001_TY001_001100
CUO0001_NA_NA500CUO0001_TY003_001730
CUO0001_NA_NA500CUO0001_TY002_001200
CUO0001_NA_NA500CUO0001_TY001_001100
NA_TY003_NA100CUO0002_TY003_001190
NA_TY003_NA100CUO0001_TY003_001730
NA_NA_002800CUO0003_TY001_002220
CUO0003_TY001_NA710CUO0003_TY001_003500
CUO0003_TY001_NA710CUO0003_TY001_002220
CUO0003_TY001_NA710CUO0003_TY001_001190
CUO0002_NA_001280CUO0002_TY003_001190
CUO0002_NA_001280CUO0002_TY002_001230
CUO0002_NA_001280CUO0002_TY001_001650

From the table OBJECT_COST_ASSIGNMENTS, we can use the following SQL command to get the final alocated cost proportional to the target’s cost for each source object:

1
2
3
4
5
6
SELECT TARGET_REFERENCE AS REFERENCE, SUM(ALOCATED_COST) AS FINAL_COST FROM (
  SELECT *, COST * (TARGET_COST / (SUM(TARGET_COST) OVER (PARTITION BY SOURCE_REFERENCE))) AS ALOCATED_COST
  FROM OBJECT_COST_ASSIGNMENTS
) AS INNER_COMPUTED_ALLOCATION
GROUP BY TARGET_REFERENCE
ORDER BY REFERENCE;

The inner query computes them sum of all the target costs within the same source object and uses this value as weight for the cost that will be alocated in that line. The outer query will aggregate all alocated values for each target object and use their sum as the final cost of each target object. The final result follows below:

REFERENCEFINAL_COST
CUO0001_TY001_001176.78289197819566
CUO0001_TY002_001353.56578395639133
CUO0001_TY003_0011369.8629375277847
CUO0002_TY001_0011003.6482752196727
CUO0002_TY002_001355.1370820008073
CUO0002_TY003_001314.0262851311017
CUO0003_TY001_001391.89624329159216
CUO0003_TY001_0021253.7745974955278
CUO0003_TY001_0031031.3059033989266

Conclusion

There could be alternative ways to calculate the final cost of each complete object, we could use more sofisticated volumetries or formulas, but this one gets the job done for our example. We are basically alocating more for objects that already have higher costs, while alocating less for objects that don’t have much cost. It’s a criteria we defined that could be considered fair with no additional context. That should be enough to understand the ideia of reallocating costs of “None Objects” to “Complete Objects”.

I hope this can help anyone to apreciate how many interesting things we can do with JOINs in order to perform conditional assignments between multiple SQL tables very creatively.

See you in the next post!

This post is licensed under CC BY 4.0 by the author.