Data Engineer InternAt — Bauhaus Group
We are seeking a motivated and detail-oriented Data Engineer Intern to join our dynamic team. This internship offers a unique opportunity to gain hands-on experience in data engineering, analytics, and visualization within the real estate industry. As a Data Engineer Intern, you will work closely with our data science and engineering teams to support various data-driven projects.Key Responsibilities:
- Assist in the design, development, and maintenance of data pipelines processes.
- Collaborate with cross-functional teams to gather and process data from multiple sources.
- Utilize JIRA for task management, tracking project progress, and reporting.
- Participate in Agile implementation processes, including sprint planning, daily stand-ups, and retrospectives.
- Perform data cleaning, transformation, and aggregation to prepare datasets for analysis.
- Create and maintain documentation for data workflows and processes.
- Develop and maintain dashboards and reports using Tableau and other data visualization tools.
- Conduct exploratory data analysis and generate insights to support business decisions.
- Assist in the implementation of data quality and governance standards.
Requirements:
- Currently pursuing a degree in Computer Science, Data Science, Engineering, or a related field.
- Strong proficiency in R for data manipulation and analysis.
- Basic understanding of SQL and database management.
- Familiarity with data warehousing concepts and ETL processes.
- Excellent problem-solving skills and attention to detail.
- Strong communication skills and the ability to work collaboratively in a team environment.
- Understanding of Agile methodologies and experience with JIRA or similar project management tools.
Preferred Qualifications:
- Experience with cloud platforms such as AWS / Google Cloud
- Knowledge of additional programming languages such as Python.
- Exposure to real estate data and analytics.
- Understanding of data quality and governance practices.
What We Offer:
- Hands-on experience with real-world data engineering projects.
- Mentorship and guidance from experienced professionals.
- Opportunity to work with cutting-edge technologies and tools.
- A collaborative and supportive work environment.
Key Responsibilities:
Assist in the design, development, and maintenance of data pipelines processes.
Collaborate with cross-functional teams to gather and process data from multiple sources.
Utilize JIRA for task management, tracking project progress, and reporting.
Participate in Agile implementation processes, including sprint planning, daily stand-ups, and retrospectives.
Perform data cleaning, transformation, and aggregation to prepare datasets for analysis.
Create and maintain documentation for data workflows and processes.
Develop and maintain dashboards and reports using Tableau and other data visualization tools.
Conduct exploratory data analysis and generate insights to support business decisions.
Assist in the implementation of data quality and governance standards.
Training:
Initial General Employer Orientation (Mandatory):
Familiarization with company culture, policies, tools (JIRA, Tableau, GCP, data warehousing solutions).
Duration: 1 day
Weekly Scheduled One-on-One Mentor Meetings:
Personalized guidance, feedback, and goal setting, focusing on ETL processes, data modeling, and pipeline optimization.
Duration: 3 hours per week
Job Shadowing:
Practical exposure to designing and implementing ETL pipelines, data extraction, transformation, and storage optimization.
Duration: 2 weeks (daily 2 hours)
Workshops/Skills Training:
Enhancing technical skills in R, SQL, Tableau, ETL design, and cloud platforms (GCP).
Duration: 4 weeks (weekly 2-hour sessions)
Provision of Work Samples:
Access to previous projects and tasks for reference, including Tableau dashboards, and Python scripts.
Duration: Ongoing (as needed)
Overview/Contextualization of Assigned Tasks:
Detailed briefing on tasks, including objectives, expected outcomes, and relevance to data engineering projects.
Duration: Before the start of each new task
Training Literature Reviews and Testing:
Reading and assessments on best practices, data analysis and transforming solutions.
Learning Outcomes:
Proficiency in Data Pipeline Development:
Successfully design, develop, and maintain efficient ETL (Extract, Transform, Load) data pipelines that handle large volumes of structured and unstructured data.
Demonstrate the ability to integrate data from multiple sources such as APIs, relational databases, and flat files into a cohesive and accessible format.
Enhanced Data Analysis and Visualization Skills:
Gain hands-on experience in data cleaning, transformation, and aggregation using R, SQL, and other data manipulation tools, leading to the preparation of high-quality datasets.
Develop and maintain interactive dashboards and reports using Tableau and other BI (Business Intelligence) tools, providing valuable insights for data-driven decision-making.
Agile Project Management Experience:
Actively participate in Agile Scrum processes, including sprint planning, daily stand-ups, sprint reviews, and retrospectives.
Utilize JIRA for effective task management, tracking project progress, and generating sprint reports, ensuring timely completion of deliverables.
Improved Collaboration and Communication Skills:
Collaborate effectively with cross-functional teams, including data scientists, data engineers, software developers, and business stakeholders, to gather and process data requirements.
Enhance communication skills by creating comprehensive documentation for data workflows, ETL processes, and data governance protocols.
Knowledge of Data Quality and Governance:
Assist in the implementation of data quality and governance standards, ensuring the accuracy, consistency, and reliability of data through data profiling, validation, and monitoring.
Develop a strong understanding of data warehousing concepts such as OLAP (Online Analytical Processing), OLTP (Online Transaction Processing), and data lakes.
Technical Skill Advancement:
Strengthen proficiency in R for data manipulation, statistical analysis, and visualization, along with a foundational understanding of SQL for database querying and management.
Gain exposure to cloud platforms such as AWS (Amazon Web Services) or Google Cloud Platform (GCP) and additional programming languages like Python for data engineering tasks, broadening technical expertise.