Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data.
Hands on experience in the Hadoop ecosystem components like HDFS, Hive, Sqoop, Spark, and Kafka.
Implemented data pipeline using Azure Data Factory from ACME's systems to Azure Data Lake services using T-SQL, Spark SQL and Azure synapse Analytics.
Built a Spark data pipeline using Azure Synapse Analytics and Synapse SQL.
Use Spark DataFrame API and Spark SQL for transformation data to meaningful data for visualization.