I love to create data cubes. One of the most intense aspects of data cubes is the data collection, which I’ll leave for a future tip. Data organizing and formatting for analysis / reporting closely follows data collection intensity for me. So for you number jockeys, today’s tip is the formula SUMPRODUCT and why it is so valuable.
Technically, SUMPRODUCT, allows you to work with arrays and perform vector math. You can go to Excel / Calc help and look up the formula name and get the basic limited view of how to use the formula. I am going to give a practical example that has allowed me to simplify and automate the data organizing and analysis.
My original problem was attempting to count how many times a specific team won. In my data set, a team can reside in both the Home and Away columns depending on where they were playing that week. So I was trying to count values only if it was associated with a specific team in a specific column. I tried to embed a second set of criteria within the COUNTIF() function only to get errors from those resulting formulas.
As a result of my findings, we will examine three functions; COUNTIF(), SUM(), AND SUMIF(), and how they can be replaced by SUMPRODUCT which can have as many criteria you can come up with, in other words that wonderful concept of infinity.
In my first example, I want to count how many times the Home team won in week #1. I have two sheets titled ‘2007′ (holds the data) and ‘2007 Home Team Analysis’ (the sheet where I organize the data for analysis). That produces the following formula =COUNTIF(’2007′.T2:T17;”H”). Now you ask, what do I do for week #2. Well, you would copy the formula down to the next cell and change lookup field in column T resulting in the following formula =COUNTIF(’2007′.T18:T33;”H”). Ok, not so bad until you take into account weeks 3-10 vary from 13 to 15 games per week due to the bye’s. Having to go in and manually update lookup fields gives me a head ache. I don’t know about you, but I love to find ways to keep me from getting a head ache.
Here’s the solution to this problem: On your analysis sheet (’2007 Home Team Analysis’) you have a column to identify the criteria. In this case I have a column (week #) that appears in both sheets, so instead of having to create 17 different lookup fields I now have only 1 lookup field that contains 256 rows for the entire NFL football season. Therefore, the new way to count home team wins is
=SUMPRODUCT((’2007′.$B$2:$B$257=$A3)*(’2007′.$T$2:$T$256=”H”))
Now for the cool part, the explanation of how this reduces the data cube to the result. The data starts with a column of 256 values (Column B on ‘2007′) and is reduced by matching only the values based on the criteria cell (Cell A3 in ‘2007 Home Team Analysis’). So using week 1 as an example, the array is reduced from 256 values to 16. Now continue to pare the data using the second criteria home team wins (Column T of of ‘2007′) . Now overlay that 16 cell template over values in Column T and the result is ten home team wins in week 1 of 2007.
Now, with not as much verbiage, here are some more example to ponder:
How many points did the home teams score in week 1:
=SUM(’2007′.R2:R17) is replaced by =SUMPRODUCT((’2007′.$B$2:$B$257=$A3)*(’2007′.$R$2:$R$257))
One step further, How many points did the home teams score if they won in week 1:
=SUMIF(’2007′.T34:T49;”H”;’2007′.R34:R49) is replaced by =SUMPRODUCT((’2007′.$B$2:$B$257=$A3)*(’2007′.$T$2:$T$257=”H”)*(’2007′.$R$2:$R$257))
In conclusion, SUMPRODUCT() allows you take some basic functions and expand them to solutions only limited by your own creativity. As for my original problem stated above, here’s the solution:
=SUMPRODUCT((’2007′.$E$2:$E$257=$C3)*(’2007′.$T$2:$T$257=”V”))+SUMPRODUCT((’2007′.$L$2:$L$257=$C3)*(’2007′.$T$2:$T$257=”H”))
The first SUMPRODUCT () looks in the road team column (E) for team (C3) and counts how many times they won on the road noted by the “V” criteriain column (T). The second SUMPRODUCT() looks in the home team column (L) for the same team (C3) and counts hom many times they won at home noted by the “H” criterian column (T).

Entries (RSS)