Data Architect for Large-Scale AI Training

4 days ago


Melbourne, Victoria, Australia beBeeDataEngineer Full time $150,000 - $180,000
Job Summary

We are seeking a seasoned data engineer to lead the design and construction of datasets for AI training. The ideal candidate will have hands-on experience in sourcing, cleaning, transforming, and structuring massive amounts of raw data into training-ready form.

The selected individual will be responsible for designing the architecture that powers data ingestion, validation, and storage for multi-terabyte to petabyte-scale AI training.

This is a deep technical role that requires expertise in writing code, building pipelines, defining schemas, and debugging unusual data edge cases at scale.

Key Responsibilities
  • Design and build large-scale data ingestion and curation pipelines for AI training datasets
  • Source, filter, and process diverse data types including text, structured data, code, and multimodal data, from raw form to model-ready format
  • Implement robust quality control and validation systems to ensure dataset integrity, relevance, and ethical compliance
  • Architect storage and retrieval systems optimised for distributed training at scale
  • Build tooling to track dataset lineage, reproducibility, and metadata at all stages of the pipeline
Requirements
  • Passionate about building world-class datasets for AI training from raw source to training-ready
  • Experienced in Python and data engineering frameworks such as Apache Spark, Ray, or Dask
  • Skilled in working with distributed data storage and processing systems such as S3, HDFS, or cloud object storage
  • Strong understanding of data quality, validation, and reproducibility in large-scale ML workflows
About Us

We offer a dynamic work environment that fosters collaboration and innovation.



  • Melbourne, Victoria, Australia beBeeData Full time $122,235 - $129,811

    Job TitleLead Data Solutions Architect for Large-Scale IT Project Job DescriptionThis is a key role that will support the Legal Metrology Branch in delivering a major IT enhancement project. The Data Architect will be responsible for designing and implementing data architecture and integration frameworks to enable best practice data lifecycle management and...


  • Melbourne, Victoria, Australia beBeeDataEngineer Full time $120,000 - $130,000

    Unlocking the Power of Data EngineeringJob OverviewThe role of Data Engineer involves designing, building, and maintaining large-scale data systems to support business decision-making. This includes developing and implementing ETL pipelines, data warehouses, and data lakes.Key ResponsibilitiesArchitect scalable data pipelines using AWS Glue, Airflow, and...


  • Melbourne, Victoria, Australia beBeeInfrastructure Full time $180,000 - $250,000

    We're seeking a seasoned AI Infrastructure Architect to join our team. As a Staff Machine Learning Engineer, you'll play a pivotal role in designing and implementing foundational AI Platform capabilities that empower our users. Key Responsibilities:Designing and architecting large-scale AI infrastructure systemsLeading cross-team initiatives to develop and...


  • Melbourne, Victoria, Australia beBeeArtificial Full time $160,000 - $200,000

    Our organization is seeking an experienced AI Solutions Architect to lead the development and implementation of cutting-edge artificial intelligence projects. This high-profile role involves designing and implementing innovative solutions using distributed systems architecture, collaborating with cross-functional teams, and mentoring junior engineers.Key...


  • Melbourne, Victoria, Australia beBeeArtificial Full time $175,000 - $245,000

    Architect AI Infrastructure We are building a cutting-edge artificial intelligence platform that empowers millions of users worldwide. As a key member of our team, you will be responsible for architecting foundational AI capabilities, making technical decisions that impact how models are trained and deployed. Key Responsibilities: Architecting foundational...


  • Melbourne, Victoria, Australia beBeeData Full time $150,000 - $180,000

    Job DescriptionLead Data ArchitectWe are seeking a highly skilled data engineer to lead the design and construction of large-scale datasets for AI training. The ideal candidate will have experience working with distributed data storage and processing systems, as well as a strong understanding of data quality, validation, and reproducibility in large-scale ML...


  • Melbourne, Victoria, Australia Bebeearchitecture Full time

    Job Overview:We seek a skilled Data Architecture Specialist to design and implement modern, scalable AI solutions using emerging technologies from cloud platforms.The successful candidate will be responsible for architecting and delivering large-scale data architectures, collaborating with cross-functional teams to drive business outcomes.Key...


  • Melbourne, Victoria, Australia beBeeArchitecture Full time $180,000 - $220,000

    Job Overview:We seek a skilled Data Architecture Specialist to design and implement modern, scalable AI solutions using emerging technologies from cloud platforms.The successful candidate will be responsible for architecting and delivering large-scale data architectures, collaborating with cross-functional teams to drive business outcomes.Key...

  • Senior AI Architect

    3 days ago


    Melbourne, Victoria, Australia beBeeArtificialintelligence Full time $150,000 - $200,000

    AI Engineering Leadership OpportunityThis leadership role focuses on scaling AI capability across the organization. As a Principal AI Engineer, you will set technical direction for AI engineering, lead a team of engineers, and design enterprise-scale solutions using ML, GenAI, and LLMs.Main Responsibilities:Design and implement large-scale AI/ML...


  • Melbourne, Victoria, Australia beBeeData Full time $150,000 - $180,000

    Job OverviewWe are seeking an experienced professional to join our team as a Senior Data Engineer. In this role, you will be responsible for designing and building large-scale data ingestion and curation pipelines for AI training datasets.The ideal candidate will have a strong background in data engineering frameworks such as Apache Spark, Ray, or Dask, and...