So I discovered Folium about two months ago and decided to map the primitive way with it. Coordinates data is retrieved from Strava gpx files and cleaned up leaving only latitude and longitude as below.
head Camin_prim_stage1.csv lat,lon 43.3111770,-5.6941620 43.3113360,-5.6943420 43.3114370,-5.6944600 43.3115000,-5.6945420 43.3116970,-5.6948090 43.3119110,-5.6950900 43.3122360,-5.6956830 43.3123220,-5.6958090 43.3126840,-5.6963740
Below is the python file we will use to retrieve data and create the map with the routes.
import folium from pyspark.sql import SparkSession from pyspark.sql.functions import col spark = SparkSession.builder.master("local").getOrCreate() # Change Spark loglevel spark.sparkContext.setLogLevel('FATAL') # Load the rides and ride_routes data from local instead of HDFS position1 = spark.read.load("/home/user/Camin_prim_stage1.csv", format="csv", sep=",", inferSchema="true", header="true") position2 = spark.read.load("/home/user/Camin_prim_stage2.csv", format="csv", sep=",", inferSchema="true", header="true") position3 = spark.read.load("/home/user/Camin_prim_stage3.csv", format="csv", sep=",", inferSchema="true", header="true") position = [position1, position2, position3] m = folium.Map() col=0 colArray=['red','blue','green'] # Check file was correctly loaded for x in position: # x.printSchema() # x.show(2) # Map position coordinates = [[float(i.lat), float(i.lon)] for i in x.collect()] # Make a Folium map #m = folium.Map() m.fit_bounds(coordinates, padding=(25, 25)) folium.PolyLine(locations=coordinates, weight=5, color=colArray[col]).add_to(m) folium.Marker(coordinates[1], popup="Origin").add_to(m) folium.Marker(coordinates[-1], popup="Destination").add_to(m) col = col + 1 # Save to an html file m.save('chamin_prim.html') # Cleanup spark.stop()
We execute below ans result gets saved into a file called chamin_prim.html:
spark-submit camin_prim.py; echo $?; ls -ltr | tail -1 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 18/08/19 20:05:47 WARN Utils: Your hostname, server resolves to a loopback address: 127.0.0.1; using 192.168.0.99 instead (on interface eth0) 18/08/19 20:05:47 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 18/08/19 20:05:48 INFO SparkContext: Running Spark version 2.2.0 18/08/19 20:05:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/08/19 20:05:49 INFO SparkContext: Submitted application: camin_prim.py 18/08/19 20:05:49 INFO SecurityManager: Changing view acls to: user 18/08/19 20:05:49 INFO SecurityManager: Changing modify acls to: user 18/08/19 20:05:49 INFO SecurityManager: Changing view acls groups to: 18/08/19 20:05:49 INFO SecurityManager: Changing modify acls groups to: 18/08/19 20:05:49 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); groups with view permissions: Set(); users with modify permissions: Set(user); groups with modify permissions: Set() 18/08/19 20:05:49 INFO Utils: Successfully started service 'sparkDriver' on port 41115. 18/08/19 20:05:49 INFO SparkEnv: Registering MapOutputTracker 18/08/19 20:05:49 INFO SparkEnv: Registering BlockManagerMaster 18/08/19 20:05:49 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 18/08/19 20:05:49 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 18/08/19 20:05:49 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-aedd3da1-84c0-4e3c-b094-30590422f0ca 18/08/19 20:05:49 INFO MemoryStore: MemoryStore started with capacity 366.3 MB 18/08/19 20:05:49 INFO SparkEnv: Registering OutputCommitCoordinator 18/08/19 20:05:49 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 18/08/19 20:05:49 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 18/08/19 20:05:49 INFO Utils: Successfully started service 'SparkUI' on port 4042. 18/08/19 20:05:50 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.0.99:4042 18/08/19 20:05:50 INFO SparkContext: Added file file:/home/user/camin_prim.py at file:/home/user/camin_prim.py with timestamp 1534701950407 18/08/19 20:05:50 INFO Utils: Copying /home/user/camin_prim.py to /tmp/spark-47458e06-2b68-4e19-b2a1-3172bf40e4e5/userFiles-5730b68b-f428-406a-9fb3-19ed8385f6ea/camin_prim.py 18/08/19 20:05:50 INFO Executor: Starting executor ID driver on host localhost 18/08/19 20:05:50 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 39599. 18/08/19 20:05:50 INFO NettyBlockTransferService: Server created on 192.168.0.99:39599 18/08/19 20:05:50 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 18/08/19 20:05:50 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.0.99, 39599, None) 18/08/19 20:05:50 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.0.99:39599 with 366.3 MB RAM, BlockManagerId(driver, 192.168.0.99, 39599, None) 18/08/19 20:05:50 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.0.99, 39599, None) 18/08/19 20:05:50 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.0.99, 39599, None) 18/08/19 20:05:51 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/user/spark-warehouse/'). 18/08/19 20:05:51 INFO SharedState: Warehouse path is 'file:/home/user/spark-warehouse/'. 18/08/19 20:05:51 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 0 -rw-r--r-- 1 user user 866998 Aug 19 20:05 chamin_prim.html
HTML file result can be seen here.