Responsibilities
Build data ingestion and processing pipelines to enable data analytics and data science use-cases in areas of digital commerce, service operations, charging, reliability, finance , capex, warranty, customer service and others.
Build modular set of data services using Python, SQL, AWS Glue, lambdas, API Gateway, Kafka, data build tool (dbt), Apache Spark on EMR among others
Build automated unit and integration testing pipelines using frameworks like PySpark
Create and manage CICD pipelines with Gitlab CI and AWS Code Pipeline/CodeDeploy
Automate and schedule jobs using Managed Airflow
Build the ODS and reporting schemas and load the data into AWS Redshift or Snowflake
Design and build data quality management services with Apache Deequ and data observability tools like Splunk, DataDog , CloudWatch , Montecarlo
Provide a variety of query services with REST, Athena/Presto, server sent events
Configure and setup the enterprise data lineage and meta data management and data catalog support using tools like Collibra/Alation
Assist the data scientist within the data engineering team as well as other software engineering teams with data cleansing, wrangling and feature engineering
Ensure green builds for deployment and work with program management and senior leads to burn down planned deliverables in a sprint cycle
Last updated on Nov 10, 2023
Newark, California
·30+ days ago
Newark, California
·30+ days ago
Newark, California
·30+ days ago
Newark, California
·30+ days ago
Newark, California
·30+ days ago
Gurugram, Haryana
·30+ days ago
Gurugram, Haryana
·30+ days ago
Gurugram, Haryana
·30+ days ago
Mumbai, Maharashtra
·30+ days ago
Chandigarh, Chandigarh
·30+ days ago
Pune, Maharashtra
·30+ days ago
Mumbai, Maharashtra
·30+ days ago
New Delhi, Delhi
·30+ days ago
Mumbai, Maharashtra
·30+ days ago
Remote
·30+ days ago