OverseerAI is dedicated to advancing open-source AI safety and content moderation tools. We develop state-of-the-art models and datasets for brand safety classification, making content moderation more accessible and efficient for developers and organizations.
A comprehensive dataset for training brand safety classification models, featuring 16 distinct risk categories:
Category | Description |
---|---|
B1-PROFANITY | Explicit language and cursing |
B2-OFFENSIVE_SLANG | Informal offensive terms |
B3-COMPETITOR | Competitive brand mentions |
B4-BRAND_CRITICISM | Negative brand commentary |
B5-MISLEADING | Deceptive or false information |
B6-POLITICAL | Political content and discussions |
B7-RELIGIOUS | Religious themes and references |
B8-CONTROVERSIAL | Contentious topics |
B9-ADULT | Adult or mature content |
B10-VIOLENCE | Violent themes or descriptions |
B11-SUBSTANCE | Drug and alcohol references |
B12-HATE | Hate speech and discrimination |
B13-STEREOTYPE | Stereotypical content |
B14-BIAS | Biased viewpoints |
B15-UNPROFESSIONAL | Unprofessional content |
B16-MANIPULATION | Manipulative content |
Our flagship model for brand safety classification:
A lightweight, optimized version of vision-1:
We welcome contributions from the community! Whether it's:
Our models are released under the Llama 3.1 license, and our datasets are available under open-source licenses to promote accessibility and innovation in AI safety.
OverseerAI - Making AI Safety Accessible and Efficient