AI Engineer (Vision-Language-Action / Multimodal Systems)
- Location San Francisco
- Expertise Robotics
- Job Type Permanent
- Salary $ 150,000 per annum
AI Engineer (Vision-Language-Action / Multimodal Systems)
A well-funded, early-stage robotics company is building next-generation autonomous systems designed to operate in complex, real-world environments.
Their focus is on developing general-purpose robotic platforms that combine cutting-edge AI with physical systems to tackle high-impact challenges across industrial and defense-adjacent applications.
As they scale, they’re investing heavily in multimodal AI and embodied intelligence to enable robots to understand, reason, and act in dynamic environments.
The Role
Seeking an AI Engineer to develop and deploy advanced multimodal models that bridge perception, reasoning, and action in real-world robotic systems.
This role sits at the intersection of machine learning and robotics, with a focus on vision-language-action (VLA) and vision-language models (VLMs).
What You’ll Do
- Develop and optimize multimodal models (e.g. transformers, diffusion models, vision-language-action architectures)
- Build representations for perception, scene understanding, spatial reasoning, and affordances
- Integrate language-based reasoning with planning and control systems
- Design and curate large-scale multimodal datasets (video, teleoperation, synthetic data, instruction-based learning)
- Deploy models onto edge or onboard compute, optimizing for latency and reliability
- Build pipelines for training, evaluation, and scaling of ML systems
- Develop simulation-to-real (Sim2Real) workflows for robust real-world performance
- Collaborate closely with robotics, controls, and hardware teams to ensure models translate effectively into real-world behavior
- Participate in testing and iteration based on real-world system performance
What We’re Looking For
- Strong experience with multimodal machine learning (VLMs, VLAs, transformers, or similar)
- Deep expertise in PyTorch or JAX, including distributed training and GPU acceleration
- Experience building and scaling large training pipelines
- Strong software engineering skills in Python and modern ML tooling
- Experience with dataset creation, curation, and augmentation (including synthetic data)
- Understanding of deployment constraints on edge or embedded systems
- Degree (MSc/PhD preferred) in Computer Science, Machine Learning, Robotics, or related field, or equivalent experience
Nice to Have
- Experience with robotics, embodied AI, or real-world ML deployment
- Familiarity with simulation environments (e.g. Mujoco, Isaac, or similar)
- Experience with reinforcement learning, imitation learning, or policy learning
- Exposure to real-time systems or safety-critical applications
Why This Role
- Work on cutting-edge embodied AI systems bridging language, vision, and action
- High ownership over model development and real-world deployment
- Opportunity to operate at the frontier of general-purpose robotics
- Fast-paced, highly technical team building from first principles
Additional Details
- Location: San Francisco (on-site)
- Compensation: Competitive base + equity
- Benefits included
