Tag Archives: pyspark

Mapping Pirenaica with Folium

So back in September I cycled through the Pyrenees and decided to map it with Folium. Latitude and longitude info was retrieved from Strava gpx files and cleaned up using sed, grep and awk. Result is a file as below.

head Pirenaica_stage_1.csv
lat,lon
43.3057130,-1.9778780
43.3057130,-1.9778770
43.3057140,-1.9778770
43.3057230,-1.9778880
43.3060960,-1.9783310
43.3064990,-1.9788360
43.3065760,-1.9789890
43.3067630,-1.9791790
43.3069250,-1.9791730

Continue reading

Primitive way with Folium

So I discovered Folium about two months ago and decided to map the primitive way with it. Coordinates data is retrieved from Strava gpx files and cleaned up leaving only latitude and longitude as below.

head Camin_prim_stage1.csv
lat,lon
43.3111770,-5.6941620
43.3113360,-5.6943420
43.3114370,-5.6944600
43.3115000,-5.6945420
43.3116970,-5.6948090
43.3119110,-5.6950900
43.3122360,-5.6956830
43.3123220,-5.6958090
43.3126840,-5.6963740

Below is the python file we will use to retrieve data and create the map with the routes.

import folium
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
spark = SparkSession.builder.master("local").getOrCreate()

# Change Spark loglevel
spark.sparkContext.setLogLevel('FATAL')

# Load the rides and ride_routes data from local instead of HDFS
position1 = spark.read.load("/home/user/Camin_prim_stage1.csv", format="csv", sep=",", inferSchema="true", header="true")
position2 = spark.read.load("/home/user/Camin_prim_stage2.csv", format="csv", sep=",", inferSchema="true", header="true")
position3 = spark.read.load("/home/user/Camin_prim_stage3.csv", format="csv", sep=",", inferSchema="true", header="true")

position = [position1, position2, position3]

m = folium.Map()
col=0
colArray=['red','blue','green']

# Check file was correctly loaded
for x in position:
# x.printSchema()
# x.show(2)

# Map position
coordinates = [[float(i.lat), float(i.lon)] for i in x.collect()]

# Make a Folium map
#m = folium.Map()
m.fit_bounds(coordinates, padding=(25, 25))
folium.PolyLine(locations=coordinates, weight=5, color=colArray[col]).add_to(m)
folium.Marker(coordinates[1], popup="Origin").add_to(m)
folium.Marker(coordinates[-1], popup="Destination").add_to(m)
col = col + 1
# Save to an html file
m.save('chamin_prim.html')

# Cleanup
spark.stop()

Continue reading