bigdata

Semi-Structured Data in Spark (pyspark) - JSON

In this post we discuss how to read semi-structured data such as JSON from different data sources and store it as a spark dataframe. The spark dataframe can in turn be used to perform aggregations and all sorts of data manipulations. Introduction Previously we saw how to create and work with spark dataframes. In post we discuss how to read semi-structured data from different data sources and store it as a spark dataframe and how to do further data manipulations.

Continue reading