json

REST API to Spark Dataframe

With the increasing number of users in the digital world, a lot of raw data is being generated out of which insights could be derived. This is where REST APIs come into picture, as they help in filling the communication gap between the client (your software program) and the server (website’s data) Introduction REST APIs act as a gateway to establish a two-way communication between two software applications.

Continue reading

Semi-Structured Data in Spark (pyspark) - JSON

In this post we discuss how to read semi-structured data such as JSON from different data sources and store it as a spark dataframe. The spark dataframe can in turn be used to perform aggregations and all sorts of data manipulations. Introduction Previously we saw how to create and work with spark dataframes. In post we discuss how to read semi-structured data from different data sources and store it as a spark dataframe and how to do further data manipulations.

Continue reading