Audio Model Trainer

Please login or register as jobseeker to apply for this job.

TYPE OF WORK

Any

WAGE / SALARY

21/hr

HOURS PER WEEK

TBD

DATE UPDATED

Aug 26, 2025

JOB OVERVIEW

We are seeking detail-oriented and enthusiastic individuals to join a cutting-edge AI research initiative. In this role, you will be responsible for recording short audio clips that describe visual content, helping to build and refine datasets for multimodal AI systems. Your voice will directly support the development of next-generation models capable of understanding and interacting with the world across both visual and auditory domains.

Responsibilities:
View a series of images and generate clear, concise, and natural-sounding spoken descriptions.
Record short audio clips (typically 2-3 minutes each) using provided tools or platforms.
Ensure recordings are high quality and free from background noise or distortion.
Follow specific linguistic, timing, or stylistic guidelines as outlined by the research team.
Collaborate with AI researchers and QA teams to review and iterate on data quality.

Qualifications:
Excellent verbal communication and enunciation skills.
Native or near-native fluency in English (other language fluencies are a plus).
Strong attention to detail and the ability to follow annotation guidelines precisely.
Prior experience with voice recording or data annotation is a plus, but not required.
Comfortable working independently and handling repetitive tasks with consistency.

What You’ll Gain:
An opportunity to contribute to foundational AI research at a world-leading lab.
Experience working at the intersection of language, audio, and computer vision.
Flexible, remote-friendly work structure.

Pay:
You will be paid $21/hour

ONLY THOSE WHO APPLY THROUGH THE LINK WILL BE CHECKED
Schedule your interview here: ----------

VIEW OTHER JOB POSTS FROM:
SHARE THIS POST
facebook linkedin