Category Archives: UNIX

Posts regarding UNIX and Linux systems

Run python script with docker

So I started playing with docker and asked myself whether it would be possible to run a python script with docker. Well, answer is yes. Example of the script below.

#!/usr/bin/python

import sys
print "Running script!!"
print sys.version_info

Execution below:

user@computer:$ docker run -it --rm --name pythonscript -v "$PWD":/usr/src/myapp -w /usr/src/myapp python:2 python script.py
Running script!!
sys.version_info(major=2, minor=7, micro=13, releaselevel='final', serial=0)

References:
Docker documentation

Using sqoop to import a DB table into HDFS

In the world of Big Data to import data from a DB into HDFS you need Apache Sqoop.

user@computer:$ sqoop import --connect jdbc:mysql://localhost/mysql --username training -P --warehouse-dir /home/training/db --table user
17/02/23 10:38:19 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
Enter password:
17/02/23 10:38:24 INFO manager.SqlManager: Using default fetchSize of 1000
17/02/23 10:38:24 INFO tool.CodeGenTool: Beginning code generation
17/02/23 10:38:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
17/02/23 10:38:24 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
17/02/23 10:38:24 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
Note: /tmp/sqoop-training/compile/7f3a9709c50f58c2c6bb24de91922c6b/user.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/02/23 10:38:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-training/compile/7f3a9709c50f58c2c6bb24de91922c6b/user.jar
17/02/23 10:38:29 WARN manager.MySQLManager: It looks like you are importing from mysql.
17/02/23 10:38:29 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
17/02/23 10:38:29 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
17/02/23 10:38:29 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
17/02/23 10:38:29 WARN manager.CatalogQueryManager: The table user contains a multi-column primary key. Sqoop will default to the column Host only for this job.
17/02/23 10:38:29 WARN manager.CatalogQueryManager: The table user contains a multi-column primary key. Sqoop will default to the column Host only for this job.
17/02/23 10:38:29 INFO mapreduce.ImportJobBase: Beginning import of user
17/02/23 10:38:31 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
17/02/23 10:38:33 INFO db.DBInputFormat: Using read commited transaction isolation
17/02/23 10:38:33 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`Host`), MAX(`Host`) FROM `user`
17/02/23 10:38:33 WARN db.TextSplitter: Generating splits for a textual index column.
17/02/23 10:38:33 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
17/02/23 10:38:33 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
17/02/23 10:38:33 INFO mapred.JobClient: Running job: job_201702221239_0003
17/02/23 10:38:34 INFO mapred.JobClient: map 0% reduce 0%
17/02/23 10:38:55 INFO mapred.JobClient: map 17% reduce 0%
17/02/23 10:38:56 INFO mapred.JobClient: map 33% reduce 0%
17/02/23 10:39:08 INFO mapred.JobClient: map 50% reduce 0%
17/02/23 10:39:09 INFO mapred.JobClient: map 67% reduce 0%
17/02/23 10:39:21 INFO mapred.JobClient: map 83% reduce 0%
17/02/23 10:39:22 INFO mapred.JobClient: map 100% reduce 0%
17/02/23 10:39:26 INFO mapred.JobClient: Job complete: job_201702221239_0003
17/02/23 10:39:26 INFO mapred.JobClient: Counters: 23
17/02/23 10:39:26 INFO mapred.JobClient: File System Counters
17/02/23 10:39:26 INFO mapred.JobClient: FILE: Number of bytes read=0
17/02/23 10:39:26 INFO mapred.JobClient: FILE: Number of bytes written=1778658
17/02/23 10:39:26 INFO mapred.JobClient: FILE: Number of read operations=0
17/02/23 10:39:26 INFO mapred.JobClient: FILE: Number of large read operations=0
17/02/23 10:39:26 INFO mapred.JobClient: FILE: Number of write operations=0
17/02/23 10:39:26 INFO mapred.JobClient: HDFS: Number of bytes read=791
17/02/23 10:39:26 INFO mapred.JobClient: HDFS: Number of bytes written=818
17/02/23 10:39:26 INFO mapred.JobClient: HDFS: Number of read operations=6
17/02/23 10:39:26 INFO mapred.JobClient: HDFS: Number of large read operations=0
17/02/23 10:39:26 INFO mapred.JobClient: HDFS: Number of write operations=6
17/02/23 10:39:26 INFO mapred.JobClient: Job Counters
17/02/23 10:39:26 INFO mapred.JobClient: Launched map tasks=6
17/02/23 10:39:26 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=89702
17/02/23 10:39:26 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
17/02/23 10:39:26 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
17/02/23 10:39:26 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
17/02/23 10:39:26 INFO mapred.JobClient: Map-Reduce Framework
17/02/23 10:39:26 INFO mapred.JobClient: Map input records=8
17/02/23 10:39:26 INFO mapred.JobClient: Map output records=8
17/02/23 10:39:26 INFO mapred.JobClient: Input split bytes=791
17/02/23 10:39:26 INFO mapred.JobClient: Spilled Records=0
17/02/23 10:39:26 INFO mapred.JobClient: CPU time spent (ms)=5490
17/02/23 10:39:26 INFO mapred.JobClient: Physical memory (bytes) snapshot=666267648
17/02/23 10:39:26 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4423995392
17/02/23 10:39:26 INFO mapred.JobClient: Total committed heap usage (bytes)=191102976
17/02/23 10:39:26 INFO mapreduce.ImportJobBase: Transferred 818 bytes in 56.7255 seconds (14.4203 bytes/sec)
17/02/23 10:39:26 INFO mapreduce.ImportJobBase: Retrieved 8 records.

Example above dumps table user from mysql DB into hadoop.
First connect to DB using –connect
–username would by the authentication username, -P to ask for password at prompt. –warehouse-dir HDFS parent for table destination and –table to select the table to import.

Below dumped content is shown.

user@computer:$ hdfs dfs -cat /home/training/db/user/part-m-0000*
127.0.0.1,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0
localhost,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0
localhost.localdomain,root,,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0
localhost,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0
localhost.localdomain,,,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0
localhost,training,*27CF0BD18BDADD517165824F8C1FFF667B47D04B,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,N,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,Y,,,,,0,0,0,0
localhost,hiveuser,*2470C0C06DEE42FD1618BB99005ADCA2EC9D1E19,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0
localhost,hue,*15221DE9A04689C4D312DEAC3B87DDF542AF439E,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,N,,,,,0,0,0,0

Python script to download all NASDAQ and NYSE ticker symbols

Below is a short python script to get all NASDAQ and NYSE common stock tickers. You can then use the resulting file to get a lot of info using yahoofinance library.

#!/usr/bin/env python

import ftplib
import os
import re

# Connect to ftp.nasdaqtrader.com
ftp = ftplib.FTP('ftp.nasdaqtrader.com', 'anonymous', 'anonymous@debian.org')

# Download files nasdaqlisted.txt and otherlisted.txt from ftp.nasdaqtrader.com
for ficheiro in ["nasdaqlisted.txt", "otherlisted.txt"]:
        ftp.cwd("/SymbolDirectory")
        localfile = open(ficheiro, 'wb')
        ftp.retrbinary('RETR ' + ficheiro, localfile.write)
        localfile.close()
ftp.quit()

# Grep for common stock in nasdaqlisted.txt and otherlisted.txt
for ficheiro in ["nasdaqlisted.txt", "otherlisted.txt"]:
        localfile = open(ficheiro, 'r')
        for line in localfile:
                if re.search("Common Stock", line):
                        ticker = line.split("|")[0]
                        # Append tickers to file tickers.txt
                        open("tickers.txt","a+").write(ticker + "\n")

Small script to calculate future value

Heres a small script I wrote to calculate the future value.

#!/usr/bin/env python

# This script calculate the future value

iv = float(raw_input("Enter initial value: "));
rate = float(raw_input("Enter interest rate: "));
times = int(raw_input("Enter the amount of years: "));

fv = iv*((1 + (rate/100))**times)

print "Final amount is: %.2f." %fv

Get NYT for free

I got tired of paying for the NY Times and the Wall Street Journal. Its not worth it when you can look at the articles for free if you Google them.

This can be done having Firefox as your browser and having the Ref Control add-on. Once installed configure it placing https://google.com as the referrer.

RefControl config

RefControl config

This would basically tell NYT or WSJ that you are coming from Google instead of coming directly from your home or whereever you are located.