Browse
Employers / Recruiters

XUS_IN_Data Engineer

xgs · 30+ days ago
XUS-IN
Negotiable
Full-time
Continue
By pressing the button above, you agree to our Terms and Privacy Policy, and agree to receive email job alerts. You can unsubscribe anytime.

About Xebia

Xebia is a trusted advisor in the modern era of digital transformation, serving hundreds of leading brands worldwide with end-to-end IT solutions. The company has experts specializing in technology consulting, software engineering, AI, digital products and platforms, data, cloud, intelligent automation, agile transformation, and industry digitization. In addition to providing high-quality digital consulting and state-of-the-art software development, Xebia has a host of standardized solutions that substantially reduce the time-to-market for businesses.

Xebia also offers a diverse portfolio of training courses to help support forward-thinking organizations as they look to upskill and educate their workforce to capitalize on the latest digital capabilities. The company has a strong presence across 16 countries with development centres across the US, Latin America, Western Europe, Poland, the Nordics, the Middle East, and Asia Pacific.

 

Responsibilities

  • Establish scalable, efficient, automated processes for data analysis, data model development, validation, and implementation.
  • Work closely with analysts/data scientists to understand impact to the downstream data models.
  • Write efficient and well-organized software to ship products in an iterative, continual release environment.
  • Contribute to and promote good software engineering practices across the team
  • Communicate clearly and effectively to technical and non-technical audiences.

 

Minimum Qualifications:

  • University or advanced degree in engineering, computer science, mathematics, or a related field
  • Strong hands-on experience in Databricks using PySpark and Spark SQL (Unity Catalog, workflows, Optimization techniques)
  • Experience with at least one cloud provider solution (GCP preferred)
  • Strong experience working with relational SQL databases.
  • Strong experience with object-oriented/object function scripting language: Python.
  • Working knowledge in any transformation tools, DBT preferred.
  • Ability to work with Linux platform.
  • Strong knowledge of data pipeline and workflow management tools (Airflow)
  • Working knowledge of Git hub /Git Toolkit
  • Expertise in standard software engineering methodology, e.g. unit testing, code reviews, design documentation
  • Experience creating Data pipelines that prepare data for ingestion & consumption appropriately.
  • Experience in maintaining and optimizing databases/filesystems for production usage in reporting, analytics.
  • Working in a collaborative environment and interacting effectively with technical and non-technical team members equally well. Good verbal and written communication skills.

 

                                                                        

 

 

 

 

 

 

Questionnaire

Scenario 1:

Data Pipeline Design on GCP

 

You are tasked with designing a data pipeline to process and analyze log data generated by a web application. The log data is stored in Google Cloud Storage (GCS) and needs to be ingested, transformed, and loaded into BigQuery for reporting and analysis.

 

Requirements:

Ingestion: The log data should be ingested from GCS to a staging area in BigQuery.

 

Transformation: Apply necessary transformations such as parsing JSON logs, filtering out irrelevant data, and aggregating metrics.

 

Loading: Load the transformed data into a final table in BigQuery for analysis.

 

Orchestration: The entire pipeline should be orchestrated to run daily.

 

Monitoring and Alerting: Set up monitoring and alerting to ensure the pipeline runs successfully and errors are detected promptly.

 

 

Questions:

 

1) Ingestion:

 

What GCP services would you use to ingest the log data from GCS to BigQuery, and why?

Provide an example of how you would configure this ingestion process.

 

 

2) Transformation:

 

Describe how you would implement the transformation step. What tools or services would you use?

Provide an example transformation you might perform on the log data.

 

3) Loading:

 

How would you design the schema for the final BigQuery table to ensure efficient querying?

What considerations would you take into account when loading data into BigQuery?

 

4) Orchestration:

 

Which GCP service would you use to orchestrate the data pipeline, and why?

Outline a high-level workflow for the daily orchestration of the pipeline.

 

5) Monitoring and Alerting:

 

What strategies would you use to monitor the pipeline's performance?

How would you set up alerts to notify you of any issues?

 

 

Scenario 2: Optimizing BigQuery Queries

 

You are responsible for optimizing BigQuery queries to improve performance and reduce costs. You notice that a frequently run query is taking longer than expected and is costly.

 

 

Questions:

 

1) Performance Analysis:

 

How would you analyze the performance of a BigQuery query?

What specific metrics or logs would you look at to identify inefficiencies?

 

2) Optimization Techniques:

List at least three techniques you would use to optimize a BigQuery query.

Explain how each technique improves performance or reduces costs.

 

 

3) Partitioning and Clustering:

Describe how you would use partitioning and clustering in BigQuery to optimize query performance.

Provide an example scenario where each technique would be beneficial.

 

 

 

 

Scenario 3: Data Migration to GCP

 

Your organization is migrating its on-premises data warehouse to Google Cloud Platform. You need to design and implement a migration strategy.

 

Questions:

1) Planning and Assessment:

 

What factors would you consider when planning the migration of an on-premises data warehouse to GCP?

How would you assess the readiness of your existing data warehouse for migration?

 

2) Migration Strategy:

 

Describe the steps you would take to migrate data from an on-premises data warehouse to BigQuery.

What tools or services would you use to facilitate the migration?

 

3) Post-Migration Optimization:

 

After migrating the data, how would you optimize the new BigQuery data warehouse for performance and cost-efficiency?

What best practices would you follow to ensure the migrated data is accurate and queryable?

 

 

 

Scenario 4: Real-time Data Processing on GCP

 

Your company requires real-time data processing to analyze streaming data from IoT devices. The data needs to be ingested, processed, and stored for further analysis.

 

Questions:

 

1) Ingestion:

 

What GCP service(s) would you use to ingest real-time streaming data from IoT devices?

Explain the benefits of using these services for real-time data ingestion.

 

2) Processing:

 

Describe how you would implement real-time data processing on GCP.

Which GCP services would you use, and why?

 

3) Storage:

 

How would you store the processed real-time data for efficient querying and analysis?

What considerations would you take into account when choosing a storage solution?

 

 

 

One liner for GCP:

 

How do you secure data in Google Cloud Storage?

 

What is the difference between Google BigQuery and Google Cloud SQL?

 

How do you implement data pipeline automation in Google Cloud?

 

Can you explain the role of Google Cloud Pub/Sub in data processing?

 

What strategies do you use for cost optimization in Google Cloud?

 

How do you handle schema changes in Google BigQuery?

 

What is the purpose of Google Dataflow, and when would you use it?

 

How do you monitor and troubleshoot performance issues in Google Cloud Dataproc?

 

Explain the difference between managed and unmanaged instance groups in GCP.

 

How would you design a data warehouse architecture on GCP?

Last updated on Aug 16, 2024

See more

About the company

More jobs at xgs

Analyzing

Shantou, Guangdong Province

 · 

30+ days ago

 · 

30+ days ago

 · 

30+ days ago

 · 

30+ days ago

Developed by Blake and Linh in the US and Vietnam.
We're interested in hearing what you like and don't like! Live chat with our founder or join our Discord
Changelog
🚀 LaunchpadNov 27
Create a site and sell services based on your resume.
🔥 Job search dashboardNov 13
Revamped job search UI with a sortable grid, live filtering, bookmarks, and application tracking.
🫡 Cover letter instructionsSep 27
New Studio settings give you control over AI output.
✨ Cover Letter StudioAug 9
Automatically generate cover letters for any job.
🎯 Suggested filtersAug 6
Copilot suggests additional filters above the results.
⚡️ Quick applicationsAug 2
Apply to jobs using info from your resume. Initial coverage of ~200k jobs in Spain, Germany, Austria, Switzerland, France, and the Netherlands.
🧠 Job AnalysisJul 12
Have Copilot read job descriptions and extract out key info you want to know. Click "Analyze All" to try it out. Click on the Copilot's gear icon to customize the prompt.
© 2024 RemoteAmbitionAffiliate · Privacy · Terms · Sitemap · Status