Contraflo there is an alternative...

Fan Traps

posted by Mark @ 5:29pm, Wednesday 15 September 2010.

Introduction

As my previous article alluded, much of the information found on the Web about Chasm Traps and Fan Traps is in-complete, very specific, or simply too brief.

In this article on Fan Traps I hope to provide more generic examples and specifically provide supporting SQL scripts that you can play with and try to resolve yourself.

An understanding of Dimensions and Measure is useful at this point but not required.

 

So what is a Fan Trap?

Don’t worry if this part doesn’t make sense at first, that’s what the examples are for.

A Fan Trap is so called because the tables and their relationships ‘Fan Out’.

The most common example of this is 3 tables joined from left to right with 2 one-to-many joins between them, something like this (shoddy) diagram:

                                   

The diagram shows a Range table, an Item table and a weekly sales table.  Each Range has many Items and each Item may have sold in 1 or more calendar weeks.

At first you may be asking “what’s wrong with those table relationships?”, and the answer is nothing.  There is nothing wrong with the relationships in the diagram, and this is why it’s called a trap as there may be nothing obviously wrong.

To understand why the relationships are a Fan Trap, we need to look at how the data in those tables might be used to calculate aggregate figures.  For example a problem occurs when we attempt to include the number of items per Range with the total sales of said items in a single report/result set.

 

Example (scripts available from the downloads area)

Assume the Tables have the following data and structures:

Range Rangedesc
1 Toys
2 Books
3 DVDs


Item Itemdesc Range
1 Ball 1
2 Teddy Picnic 2
3 Lion King 3
4 Bat 1
5 Pets Book 2
6 Shrek 3

 

Wk Item Sales
200901 1 10
200901 2 20
200901 3 30
200902 4 10
200902 5 20
200902 6 30
200903 1 10
200903 2 20
200903 3 30
200904 4 10
200904 5 20
200904 6 30

The SQL to find out how many items are in each range could look like this:

SELECT
      r.Rangeno,
      r.Rangedesc,
      COUNT(Item) AS ItemCount
FROM
      Range AS r
      INNER JOIN Item AS i
             ON r.Rangeno = i.Rangeno
GROUP BY
      r.Rangeno,
      r.Rangedesc

The Result set looking like this:

Rangeo Rangedesc ItemCount
1 Toys 2
2 Books 2
3 DVDs 2

The SQL to find out total sales per Item could look like this:

SELECT
    i.Item,
    i.Itemdesc,
    SUM(s.Sales) AS TotalSales
FROM
    Item AS i
    INNER JOIN WeeklySales AS s
        ON i.item = s.Item
GROUP BY
    i.item,
    i.Itemdesc


The Result Set:

Item Itemdesc TotalSales
1 Ball 20
2 Teddy Picnic 40
3 Lion King 60
4 Bat 20
5 Pets Book 40
6 Shrek 60

So far you will have noticed that everything is correct, none of our figures are negatively impacted. 

Now lets say that a Manager wants to see each Range, the number of Items in a Range and the Total Sales of all Items in a Range.

We could use the below SQL statement to achieve this.

SELECT
       r.Rangeno,
       r.Rangedesc,
       COUNT(Item) AS ItemCount,
       SUM(s.Sales) AS TotalSales
FROM
       Range AS r
       INNER JOIN Item AS i
              ON r.Rangeno = i.Rangeno
       INNER JOIN WeeklySales AS s
              ON i.item = s.Item
GROUP BY
       r.Rangeno,
       r.Rangedesc
 

The Result Set:

Rangeno Rangedesc ItemCount TotalSales
1 Toys 4 40
2 Books 4 80
3 DVDs 4 120

The above Result Set is incorrect.  Although the SQL is completely valid and successfully executes returning results, the results are incorrect.

As we identified from our previous query, the ItemCount for our 3 Ranges should be 2, not 4 as shown above.  This is caused by the Fan Trap.

As you may have spotted from the SQL statement, the issue is that we are aggregating the Item table and the WeeklySales table, however both are at different aggregate levels to begin with.

Simple solution is to use COUNT(DISTINCT(Item)).  Another solution would be to perform separate aggregate queries and join the Result Sets together:

SELECT
       r.Rangeno,
       r.Rangedesc,
       i.ItemCount,
       s.TotalSales
FROM
       FAN_Range AS r
       INNER JOIN (
                     SELECT
                           i.Rangeno,
                           COUNT(i.Item) AS ItemCount
                     FROM
                           FAN_Item AS i
                     GROUP BY
                           i.Rangeno
              ) AS i
              ON r.Rangeno = i.Rangeno
       INNER JOIN (
                     SELECT
                           i.Rangeno,
                           SUM(s.Sales) AS TotalSales
                     FROM
                           FAN_Item AS i
                           INNER JOIN FAN_WeeklySales AS s
                                  ON i.Item = s.Item
                     GROUP BY
                           i.Rangeno
              ) AS s
              ON r.Rangeno = s.Rangeno

Both are easy if you are hand coding SQL for reports and data requests, in-fact I suspect many professionals would know the data and automatically write the correct queries working around the Fan Trap issues without even thinking about it. 

 

Why you need to know about Fan Traps?

Although you may well already work around Fan Traps in your hand coded SQL, its still worth while having an understanding of them so that you can identify them faster, and actually start trying to avoid them. 

The examples here are very simplistic, but as queries get complicated joining various tables, knowledge of Fan Traps can help you avoid becoming a victim of them even when on first run the query appears to perform as expected.

 

Perhaps a bigger issue is that most off the shelf Reporting Tools (Business Objects etc.) generate SQL statements based on data models (eg. BO Universes), and therefore need to be told how to manage Fan Traps.

Without a good understanding of Fan Traps a data model could be designed that passes preliminary testing, but later gets caught out providing erroneous figures.

 

Solutions

As shown above, if you are in complete control of the SQL Query the solution is straight forward.

Data model solutions are a little tricky as the solution can depend entirely on your data model and the reporting software used.

The key is to be consistent in your approach to aggregates, ensuring you define aggregates for all measures.

Also where possible define aggregate priorities.  In some cases an aggregate could be calculated in multiple ways, define them and the priority.

Unfortunately it would be impossible to list solutions to every scenario here, but the key is if you have an understanding of Fan Traps you can solve them or work around them in your given situation.

 

Conclusion

Hopefully by now you have a better understanding of Fan Traps, and are better equipped to deal with them.

If you have any questions, about the Article, please leave a comment and I will try to get back to you, or alternatively, get in contact via the Contact page.

Remember to check out the resource links from the Fan and Chasm Traps Article.

  • Del.icio.us
  • Digg
  • Technorati
  • Blinklist
  • Furl
  • Reddit
  • Facebook
  • Twitter

Comments

Hi Faris, nice blog you have it here. I have downloaded your SEO friendly blog script, will try to integrate into mine at http://www/wanluqman.com . Not so much familiar with your Fan Traps post.

wanluqman - 9:26am, Tuesday 28 September 2010.

...careless of me , my url is http://www.wanluqman.com if you would like to dropby , rambling pages & blur direction.

wanluqman - 9:56am, Tuesday 28 September 2010.

Thanks. It is very well explained. Waiting for chasm traps.

Guest - 11:38am, Tuesday 9 November 2010.

Submit Your Comment