Wednesday, December 21, 2011

Counting Unique Values of a Data Set

Here is a quick How-To from the Spreadsheets forum.  There are a few ways to do this in Google Docs but I wanted to highlight the use of the =query() function which uses GViz.  See these quick links for Google's Documentation:
Using =query()
Google Visualization API Query Language

I know "API" and "Visualization" and "Query Language" sound scary but they really aren't!  In this quick example I'll show you how to sum up data for unique values!

Consider the following data: Ted 5, Yogi 6, Adam 9, Ted 7, Adam 11, Yogi 13, Ted 6.
This example is meant for much larger data sets where there could be many different names of people with many different data points. Basically, the query() function will help us sum up the numbers next to each person's name by person and sort by the highest sum.


Breakdown of function from the center of the nest:
=query(A2:B8,"Select sum(B) pivot (A)") : This is the query used to collect the data that I would like. A2:B8 indicate the range of the data I'm going to query over. "Select sum(B) pivot(A)" says that I want to select the sum of column B by the values in column A. In spreadsheets world, this is basically a pivot table where you pivot on on the unique values from column A. This function alone puts the unique values (names) as their own column and puts the sum under it. However, most of the time it is easier to view this as the user in the post requested. See the Just query() tab above.


=transpose(query(A2:B8,"Select sum(B) pivot (A)")): This transposes the table. Basically it turns columns into rows and rows into column. See the transpose(query()) tab.


=sort(transpose(query(A2:B8,"Select sum(B) pivot (A)")),2,FALSE): The final step sorts the table by the summed column. The 2 indicates the second column in the table is the column to sort by and FALSE indicates to sort in descending order.

4 comments:

  1. the parameter separator on query() should be ; not just a comma

    ReplyDelete
  2. Hi guigouz,
    ";" is use in place of "," for some location settings. My location settings (US) use the comma instead of the semicolon when defining parameters. Feel free to replace any commas that define parameters with semicolons in your functions. This should make them work correctly!

    Good catch!

    ReplyDelete
  3. This was extremely helpful! Thanks! :)

    ReplyDelete
  4. =COUNTUNIQUE For values in a column if you only want to cunt repetitive data once
    car
    house
    car
    mouse
    = 3

    ReplyDelete