Tag Archives: GNU

Export data from HDFS to MySQL

First create the DB and table where you want to populate.

echo "create database staff2; use staff2; CREATE TABLE editorial (id INT(100) unsigned not null AUTO_INCREMENT, name VARCHAR(20), email VARCHAR(20), primary key (id));" | mysql -u root -p

Once done, we have the data we want to copy in HDFS.

hdfs dfs -cat /home/training/staff/editorial/part-m-*
1,Peter,peter@example.com
2,Jack,jack@example.com

Now dump into MySQL using sqoop.

sqoop export --connect jdbc:mysql://localhost/staff2 --username root -P --table editorial --export-dir /home/training/staff/editorial
17/02/27 12:51:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
Enter password: 
17/02/27 12:51:58 INFO manager.SqlManager: Using default fetchSize of 1000
17/02/27 12:51:58 INFO tool.CodeGenTool: Beginning code generation
17/02/27 12:51:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `editorial` AS t LIMIT 1
17/02/27 12:51:59 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `editorial` AS t LIMIT 1
17/02/27 12:51:59 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
Note: /tmp/sqoop-training/compile/e560499b42a9738bbc5ef127712adc7b/editorial.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/02/27 12:52:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-training/compile/e560499b42a9738bbc5ef127712adc7b/editorial.jar
17/02/27 12:52:03 INFO mapreduce.ExportJobBase: Beginning export of editorial
17/02/27 12:52:06 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/02/27 12:52:08 INFO input.FileInputFormat: Total input paths to process : 2
17/02/27 12:52:08 INFO input.FileInputFormat: Total input paths to process : 2
17/02/27 12:52:09 INFO mapred.JobClient: Running job: job_201702221239_0006
17/02/27 12:52:10 INFO mapred.JobClient:  map 0% reduce 0%
17/02/27 12:52:31 INFO mapred.JobClient:  map 50% reduce 0%
17/02/27 12:52:45 INFO mapred.JobClient:  map 100% reduce 0%
17/02/27 12:52:49 INFO mapred.JobClient: Job complete: job_201702221239_0006
17/02/27 12:52:49 INFO mapred.JobClient: Counters: 24
17/02/27 12:52:49 INFO mapred.JobClient:   File System Counters
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of bytes read=0
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of bytes written=1176756
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of read operations=0
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of large read operations=0
17/02/27 12:52:49 INFO mapred.JobClient:     FILE: Number of write operations=0
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of bytes read=759
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of bytes written=0
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of read operations=19
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of large read operations=0
17/02/27 12:52:49 INFO mapred.JobClient:     HDFS: Number of write operations=0
17/02/27 12:52:49 INFO mapred.JobClient:   Job Counters 
17/02/27 12:52:49 INFO mapred.JobClient:     Launched map tasks=4
17/02/27 12:52:49 INFO mapred.JobClient:     Data-local map tasks=4
17/02/27 12:52:49 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=64216
17/02/27 12:52:49 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
17/02/27 12:52:49 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
17/02/27 12:52:49 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
17/02/27 12:52:49 INFO mapred.JobClient:   Map-Reduce Framework
17/02/27 12:52:49 INFO mapred.JobClient:     Map input records=2
17/02/27 12:52:49 INFO mapred.JobClient:     Map output records=2
17/02/27 12:52:49 INFO mapred.JobClient:     Input split bytes=661
17/02/27 12:52:49 INFO mapred.JobClient:     Spilled Records=0
17/02/27 12:52:49 INFO mapred.JobClient:     CPU time spent (ms)=3390
17/02/27 12:52:49 INFO mapred.JobClient:     Physical memory (bytes) snapshot=422584320
17/02/27 12:52:49 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2940895232
17/02/27 12:52:49 INFO mapred.JobClient:     Total committed heap usage (bytes)=127401984
17/02/27 12:52:49 INFO mapreduce.ExportJobBase: Transferred 759 bytes in 42.9426 seconds (17.6748 bytes/sec)
17/02/27 12:52:49 INFO mapreduce.ExportJobBase: Exported 2 records.

Now we can see the content in MySQL DB named staff2.

echo "use staff2; SELECT * FROM editorial;" | mysql -u root -p 
Enter password: 
id	name	email
1	Peter	peter@example.com
2	Jack	jack@example.com

Create csv file ordered with highest yield dividend paying stock

Leave a reply

For this script to work Yahoo Finance is needed. It can be easily installed with pip. Stock tickers are taken from a file called tickers.txt.

#!/usr/bin/env python

import time
from yahoo_finance import Share

dict = {}
f = open("tickers.txt", "r")
for ticker in f.readlines():
        dividend = Share(ticker).get_dividend_yield()
        if dividend is not None:
                print ticker.rstrip()+": "+dividend
                if float(dividend) > 0:
                        dict[float(dividend)] = ticker.rstrip()
f.close()

filename = "dividends-"+time.strftime("%Y%m%d")+".csv"
f = open(filename, "a")
f.write("Ticker, Dividend Yield, Dividend Share, Stock Price\n")
print "Ticker; Dividend Yield; Dividend Share; Stock Price\n"
for k in reversed(sorted(dict.keys())):
        escribir = dict[k]+","+str(k)+","+str(float(Share(dict[k]).get_dividend_share()))+","+str(float(Share(dict[k]).get_price()))+"\n"
        print dict[k]+";"+str(k)+";"+Share(dict[k]).get_dividend_share()+";"+Share(dict[k]).get_price()
        f.write(escribir)
f.close()

When done it creates a csv (named dividends-YYYYMMDD.csv) file ordered by highest dividend yield paying stocks.

00:55:40 [me@server Finance]$ head -5 dividends-20160414.csv
Ticker, Dividend Yield, Dividend Share, Stock Price
GLBL,44.0,1.1,2.64
CLM,29.7,4.42,15.624
XRDC,22.99,0.6,2.65
CRF,22.1,3.98,16.75
00:55:49 [me@server Finance]$

Small script to calculate future value

Leave a reply

Heres a small script I wrote to calculate the future value.

#!/usr/bin/env python

# This script calculate the future value

iv = float(raw_input("Enter initial value: "));
rate = float(raw_input("Enter interest rate: "));
times = int(raw_input("Enter the amount of years: "));

fv = iv*((1 + (rate/100))**times)

print "Final amount is: %.2f." %fv

DNS records creator

Leave a reply

Back around here. Quick post of a small script to create A and PTR dns records.

Input file. First column is the IP and second column is the fqdn. We call this file hosts.txt

10.124.12.34 athletic.abc.com
192.158.21.32 deportivo.abc.com
92.32.43.12 drac1.abc.com

Script below:

#!/bin/bash

while read line; do
        IP=`echo $line | awk '{print $1}'`
        HOST=`echo $line | awk '{print $2}'`
        PTR=`echo "${IP}" | awk -F\. '{print $4"."$3"."$2"."$1".in-addr.arpa."}'`
        echo "${HOST}. IN A ${IP}"
        echo "${PTR} IN PTR ${HOST}."
done < hosts.txt

Execution:

me@server:/tmp$ bash dnsconverter.sh
athletic.abc.com. IN A 10.124.12.34
34.12.124.10.in-addr.arpa. IN PTR athletic.abc.com.
deportivo.abc.com. IN A 192.158.21.32
32.21.158.192.in-addr.arpa. IN PTR deportivo.abc.com.
drac1.abc.com. IN A 92.32.43.12
12.43.32.92.in-addr.arpa. IN PTR drac1.abc.com.
me@server:/tmp$

Project Euler problem 24

Leave a reply

Answer to problem 24 from Project Euler. Python is the programming language of choice for this problem too.

#!/usr/bin/env python
# Project Euler problem 24

import itertools

newlist = list(map("".join, itertools.permutations('0123456789')))
print newlist[999999]

XaviGNU

Me and my world

Tag Archives: GNU

Export data from HDFS to MySQL

Create csv file ordered with highest yield dividend paying stock

Small script to calculate future value

DNS records creator

Project Euler problem 24