Home‎ > ‎sed/awk‎ > ‎

Pivot tables with awk





We have a tables with data like the following: 
fruit   qty  store   price
apple   23  walmart  5
apricot   235 giant    4
avocado   576 kroger   5
[...]


Pivot Sum

In Excel


we would take the data: 
$ cat file1.csv | column -t -s ,
fruit       qty  store    price
apple       34   walmart  5
apricot     34   giant    4
avocado     23   kroger   3
banana      54   safeway  2
blueberry   65   walmart  1
blackberry  5    safeway  6
apple       68   kroger   4
blackberry  65   walmart  6
banana      6    kroger   7
blackberry  576  safeway  9
banana      87   kroger   9
apple       78   walmart  5
blueberry   76   safeway  4
banana      67   walmart  5
apricot     9    kroger   6
apple       5    giant    3
blueberry   447  safeway  2
banana      76   giant    4
blackberry    56       walmart    6

and create the following pivot table that shows the total number of each of the fruits if we were to purchase all of them.  
ie: Apples = (34 + 68 + 78 + 5) = 185


In Awk

The awk command would be the following: 
awk 'BEGIN {FS=OFS=","} \
NR>1 \
{fruit[$1]+=$2} \
END {for (x in fruit) {print x,fruit[x]}}' \
$1 | sort -n

Where: 
  • "FS=OFS=","":  Field Separator "FS" = ","
  • "NR>1" : ignore the first line
  • "{fruit[$1]+=$2}" : for the array named after the fruit($1), add the quantity($2) of the matching fruits
  • "{for (x in fruit) {print x,fruit[x]}}" : go through the array, and print the two columns
  • "$1 | sort -n" : accept the file to analyze as the argument to the script, and then sort the resulting output


And the output would look like the following: 
$ ./pivot-file1.sh file1.csv 
apple,185
apricot,43
avocado,23
banana,290
blackberry,702
blueberry,588

Pivot Add

In Excel: 

we would take the data: 
$ cat file2.csv | column -t -s , 
column: line too long
fruit       qty   store    price
apple       23   walmart  5
apricot     235  giant    4
avocado     576  kroger   5
banana      98   safeway  6
blueberry   6    walmart  7
blackberry  5    walmart  3
apricot     767  kroger   6
blackberry  87   safeway  9
avocado     67   walmart  6
blackberry  78   kroger   2
apple       89   safeway  5
apricot     56   walmart  8
blackberry  65   kroger   9
apple       87   safeway  7
avocado     62   walmart  4
blackberry  34   kroger   2
avocado     23   safeway  4
apple       38   kroger   1
blackberry  95   walmart  2

and create the following pivot table that shows the number of times each of the fruits are sold at a store.  
ie: Apples = (walmart + safeway + safeway + kroger) = 4


In Awk


The awk command would be the following: 
awk 'BEGIN {FS=OFS=","} \
NR>1 \
{count[$1]++} \
END {for (word in count) {print word, count[word]}}' \
$1 | sort -n

Where: 
  • "FS=OFS=","":  Field Separator "FS" = ","
  • "NR>1" : ignore the first line
  • "{count[$1]++}" : count all matching fruit($1)
  • "{for (word in count) {print word, count[word]}" : Print the fruit, and then the number of times they showed up.  
  • "$1 | sort -n" : accept the file to analyze as the argument to the script, and then sort the resulting output



And the output would look like the following: 
./pivot-file2.sh file2.csv
apple,4
apricot,3
avocado,4
banana,1
blackberry,6
blueberry,1


Reference: 



Comments