{"id":1283,"date":"2017-01-04T17:55:48","date_gmt":"2017-01-04T22:55:48","guid":{"rendered":"http:\/\/www.xavignu.com\/?p=1283"},"modified":"2017-01-04T17:55:48","modified_gmt":"2017-01-04T22:55:48","slug":"sum-function-in-apache-pig","status":"publish","type":"post","link":"https:\/\/www.xavignu.com\/?p=1283","title":{"rendered":"SUM function in Apache Pig"},"content":{"rendered":"<p>I have been learning <a href=\"https:\/\/en.wikipedia.org\/wiki\/Big_data\" target=\"_blank\">Big Data<\/a> and among other things <a href=\"https:\/\/pig.apache.org\/\" target=\"_blank\">Apache Pig<\/a>. So initially I thought that once you have loaded the data a <a href=\"https:\/\/pig.apache.org\/docs\/r0.7.0\/piglatin_ref2.html#SUM\" target=\"_blank\">SUM<\/a> applied after a FOREACH would bring the total amount, but that&#8217;s not the case. A grouping needs to be performed first else SUM would give an error.<\/p>\n<p><!--more--><\/p>\n<p>We will perform the below script:<\/p>\n<p>[code language=&#8221;text&#8221;]<br \/>\n&#8212; This PIG script sums the total amount of sales<br \/>\n&#8212; First load the data from sales.txt file<br \/>\ndata = LOAD &#8216;sales.txt&#8217; USING PigStorage(&#8216;,&#8217;) AS (name:chararray, price:int, country:chararray);<\/p>\n<p>&#8212; Group the data<br \/>\ngrouped = GROUP data ALL;<\/p>\n<p>&#8212; Once grouped generate total sum of all sales<br \/>\ntotal = FOREACH grouped GENERATE SUM(data.price);<\/p>\n<p>&#8212; Print to screen<br \/>\nDUMP total;<br \/>\n[\/code]<\/p>\n<p>Save the above code with .pig extension. Test data will be loaded from below file.<br \/>\n[code language=&#8221;text&#8221;]<br \/>\nAlice,3000,us<br \/>\nAlice,2000,us<br \/>\nBob,500,ca<br \/>\nJuan,500,mx<br \/>\nHans,2000,de<br \/>\nJoan,1000,fr<br \/>\nPiero,6000,it<br \/>\n[\/code]<\/p>\n<p>and execute locally:<\/p>\n<pre id=\"terminal\">pig -x local  totalsales.pig \r\n17\/01\/04 14:28:57 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead\r\n17\/01\/04 14:28:57 INFO util.ProcessTree: setsid exited with exit code 0\r\n17\/01\/04 14:28:58 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead\r\n(15000) \r\n<\/pre>\n<p>Reference:<br \/>\n<a href=\"https:\/\/www.thomashenson.com\/sum-field-apache-pig\/\">Thomas Henson<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have been learning Big Data and among other things Apache Pig. So initially I thought that once you have loaded the data a SUM applied after a FOREACH would bring the total amount, but that&#8217;s not the case. A grouping needs to be performed first else SUM would give an error.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true},"categories":[74],"tags":[20,75,6,23,70],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_shortlink":"https:\/\/wp.me\/pTQgt-kH","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts\/1283"}],"collection":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1283"}],"version-history":[{"count":7,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts\/1283\/revisions"}],"predecessor-version":[{"id":1290,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=\/wp\/v2\/posts\/1283\/revisions\/1290"}],"wp:attachment":[{"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1283"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1283"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.xavignu.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1283"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}