Building a Trust & Safety Program from Scratch: Lessons from Amazon, Google, and TikTok
Every successful platform eventually needs Trust & Safety. The question is whether you build it proactively or reactively after a crisis.
I've built Trust & Safety operations from scratch twice—at TikTok LATAM (zero to regional safety infrastructure) and within new product launches at Amazon and Google. Here's what actually works when you're starting from nothing.
The Biggest Mistake: Waiting Too Long
The most common pattern I see:
- Launch: "We'll deal with safety later"
- Early growth: "We're too small to have real safety issues"
- Inflection point: First serious incident, regulatory inquiry, or press coverage
- Panic: Scramble to build safety infrastructure under crisis pressure
- Over-correct: Spend 2-3x more than if you'd built it proactively
Reality: It's much cheaper to build safety infrastructure before you need it than to retrofit it during a crisis.
When to Start Building Trust & Safety
Bare minimum triggers:
- You have user-generated content (UGC)
- Users can interact with each other
- You handle sensitive data (financial, health, children)
- You operate in regulated industry
- You're launching in EU (DSA requirements)
Ideal timing:
- Before public launch if high-risk category (social, dating, kids, marketplace)
- At 1,000-10,000 users if medium-risk category
- Before Series A if you want clean due diligence
Too late:
- After first serious safety incident
- During regulatory enforcement
- When investors ask "who owns Trust & Safety?" and you have no answer
Phase 1: Foundation (Weeks 1-4)
Start with the minimum viable safety infrastructure:
1. Define What "Unsafe" Means for Your Platform
Write down specific prohibited content and behavior for your platform.
Start with legal minimums:
- Illegal content in your primary market (US, EU, etc.)
- CSAM (child sexual abuse material)
- Terrorism and violent extremism
- Illegal goods and services
- Copyright/IP infringement
Add platform-specific harms:
- Harassment and bullying
- Hate speech and discrimination
- Spam and manipulation
- Fraud and scams
- Misinformation (if relevant to your platform)
Document as policy: Create community guidelines or acceptable use policy
Example (Dating App):
- CSAM - prohibited, report to NCMEC
- Harassment - unwanted contact after block
- Catfishing - fake identity/photos
- Solicitation - prostitution or trafficking
- Scams - romance scams, financial fraud
- Hate speech - discrimination based on protected characteristics
2. Build User Reporting Mechanism
Users need a way to report safety issues.
Minimum viable:
- "Report" button on content/profiles
- Basic form: what's wrong, where is it, who are you
- Email confirmation of receipt
- Someone monitoring reports daily
Tools to use:
- Your existing help desk (Zendesk, Intercom, Freshdesk)
- Google Forms + email forwarding (ultra-early stage)
- Dedicated report@yourplatform.com email address
Don't build custom tooling yet. Use existing systems.
3. Establish Review Process
Someone needs to review reports and take action.
Early stage (<10,000 users):
- Founder or early employee reviews reports
- 1-2 hours per day initially
- Decision: Remove content + warn user, suspend user, or ignore report
Response time target:
- CSAM / imminent harm: <4 hours
- Severe violations (terrorism, threats): <24 hours
- Everything else: <48 hours
Create decision log:
- Spreadsheet tracking: report received, decision made, action taken, date
- This becomes your transparency data later
4. Set Up Basic Detection
Even pre-launch, implement basic automated detection:
Technical controls:
- Email verification for sign-up
- CAPTCHA to prevent bots
- Rate limiting on posting/messaging
- Profanity filter (use existing library, don't build your own)
For images/video:
- PhotoDNA for CSAM detection (Microsoft provides free access)
- Simple hashing to detect exact duplicates
For text:
- Keyword lists for extreme content (terrorism, CSAM, slurs)
- Don't over-filter - false positives destroy user experience
Realistic early stage: 90% reactive (user reports) + 10% proactive (automated detection of extreme content)
Phase 2: Scaling Infrastructure (Months 2-6)
As you grow from thousands to tens of thousands of users:
1. Formalize Moderation Operations
Hire your first Trust & Safety person:
- Title: Trust & Safety Manager or Trust & Safety Lead
- Reports to: Founder, CEO, or COO initially
- Responsibilities: Own moderation operations, vendor management, policy refinement
Typical first hire profile:
- 2-5 years content moderation experience
- Managed moderation teams or vendors
- Understands policy development
- Comfortable with ambiguity
Salary range: $80K - $120K depending on market and experience
2. Implement Moderation Vendor (If Needed)
When to use moderation vendors:
- Volume >100 reports/day
- Need 24/7 coverage
- Multi-language requirements
- Want to avoid hiring large in-house team
Major vendors:
- TaskUs - good for startups, flexible
- Accenture - enterprise-grade
- Telus International - mid-market
- CNET (Centific) - AI training + moderation
Typical pricing: $8-$15 per hour per moderator (offshore), $25-$40 (US-based)
Hybrid model (recommended):
- Vendor handles volume and 24/7 coverage
- Internal person handles escalations, policy decisions, quality assurance
3. Build Automated Detection (v2)
Invest in better proactive detection:
Text classification:
- Use pre-trained models (OpenAI Moderation API, Perspective API, Hive)
- Don't build your own ML models yet (too expensive, not better than off-the-shelf)
Image/video detection:
- Hive, Clarifai, or AWS Rekognition for NSFW detection
- PhotoDNA for CSAM (mandatory)
- Google Vision API for general classification
Behavioral signals:
- New account spam patterns (post volume, follow patterns)
- Coordinated activity detection
- Suspicious engagement patterns
Don't over-automate: Keep humans in the loop for final decisions on removal
4. Create Appeals Process
Users will disagree with your moderation decisions. You need a way to handle appeals.
Basic appeals process:
- User clicks "Appeal" on enforcement notification
- Form asks: why do you think this was a mistake?
- Different reviewer (not original moderator) re-reviews
- Decision within 48-72 hours
- Overturn if wrong, uphold with explanation if correct
Track appeal metrics:
- Appeal rate (what % of enforcements are appealed)
- Overturn rate (what % of appeals result in reversal)
- Target overturn rate: 5-15% (too low = not enough appeals, too high = poor initial accuracy)
5. Establish Incident Response Plan
You will have safety incidents. Plan before they happen.
Define severity levels:
- P0 (Critical): CSAM, imminent harm, active shooter, regulatory enforcement
- P1 (High): Media coverage, high-profile user incident, wave of similar issues
- P2 (Medium): Individual serious violation, complaint from authority
- P3 (Low): Standard policy violation
For each severity:
- Who gets notified (CEO, legal, PR, etc.)
- How quickly must we respond
- Who has decision authority
- Who communicates externally
Practice: Run incident simulations quarterly
Phase 3: Mature Operations (Months 6-18)
As you scale to hundreds of thousands or millions of users:
1. Build Cross-Functional Safety Processes
Trust & Safety can't be siloed. Integrate with:
Product: Safety review for all new features
- Who can use this feature? (age, verified users, etc.)
- What can go wrong? (harassment, spam, fraud)
- What safety controls are needed?
- Launch checklist includes safety sign-off
Engineering: Safety infrastructure roadmap
- Detection systems
- Logging and auditability
- Account action infrastructure (warnings, suspensions, bans)
- Appeals handling
Legal: Regulatory compliance alignment
- Which regulations apply
- What documentation is required
- Regulatory reporting obligations
- Authority relationship management
Communications/PR: Crisis communication
- Spokespeople trained on safety topics
- Pre-drafted holding statements
- Escalation triggers for PR involvement
2. Implement Quality Assurance
How do you know your moderation is accurate?
Sampling and audits:
- Review 10% of all moderation decisions weekly
- Separate QA team from front-line moderators
- Track accuracy by moderator, policy type, content category
Calibration sessions:
- Weekly meetings with moderators
- Review edge cases together
- Ensure consistent policy interpretation
- Update guidance based on new patterns
User feedback:
- Track appeal overturn rates
- Survey users after enforcement (why did this happen?)
- Incorporate feedback into policy refinement
Target accuracy: 92-96% (perfect accuracy impossible with subjective policies)
3. Develop Transparency Reporting
Even before it's legally required (DSA, etc.), build transparency reporting:
Track:
- Volume of user reports received
- Volume of content actioned (removed, warnings, suspensions)
- Action rate (what % of reports lead to action)
- Average time to action
- Appeal volume and overturn rate
- Proactive detection vs. user-reported
Publish: Quarterly or semi-annually
Benefits:
- Accountability to users
- Investor/board confidence
- Regulatory compliance (if/when required)
- Benchmarking and improvement
Common Mistakes When Building Trust & Safety
Mistake 1: "We're Different, Normal Rules Don't Apply"
I hear this constantly: "We're a professional network, we don't have safety issues" or "Our users are vetted, we don't need moderation."
Reality: Every platform with humans has safety issues. LinkedIn has fraud and harassment. GitHub has harassment and spam. Professional platforms aren't exempt.
Solution: Assume you'll have safety issues and build accordingly.
Mistake 2: Over-Reliance on Automation
"We'll just use AI to moderate everything automatically."
Reality: Current AI can help detect, but shouldn't auto-remove except in narrow cases (exact CSAM hashes, known malware). Humans needed for context and edge cases.
Solution: Automation flags for human review. Humans make final decisions.
Mistake 3: Vague Policies
"Be respectful" and "don't abuse the platform" are not enforceable policies.
Reality: Moderators and users need specific definitions. What exactly is harassment? What's hate speech? What's spam?
Solution: Write specific, enforceable policies with examples. "Harassment includes repeated unwanted contact after a user blocks you" is enforceable.
Mistake 4: No Policy Enforcement for "Important" Users
"This user drives a lot of engagement, we can't ban them even though they violate policies."
Reality: Inconsistent enforcement destroys trust and creates legal risk. If your policy says X is prohibited, you must enforce on everyone.
Solution: Enforce policies consistently. If an important user is valuable despite policy violations, change the policy—don't make exceptions.
Mistake 5: Treating Safety as Cost Center
"Trust & Safety doesn't make money, minimize spending on it."
Reality: Safety failures cost more than safety infrastructure. One major incident can cost millions in legal fees, settlements, lost users, PR damage.
Solution: Treat safety as risk mitigation. The ROI is avoiding catastrophic losses.
Need Help Building Your Trust & Safety Program?
Echelon Advisory helps startups build Trust & Safety infrastructure from scratch, based on lessons from launching operations at Amazon, Google, and TikTok.
Services:
- Phase 1: Foundation Setup ($10K-$20K) - Policies, reporting, review process, basic detection
- Phase 2: Scaling Infrastructure ($25K-$50K) - Vendor selection, advanced detection, appeals, incident response
- Phase 3: Ongoing Advisory ($5K-$10K/month) - Strategic guidance, policy updates, compliance reviews
Key Takeaways
- Start building Trust & Safety before you need it (much cheaper than retrofitting)
- Minimum viable: policies, reporting mechanism, review process, basic detection
- Use off-the-shelf tools, don't build custom systems early
- Hire your first Trust & Safety person around 10K-50K users
- Human oversight required even with automation
- Integrate safety into product development, not bolted on after
- Treat safety as risk mitigation, not cost center
- Plan for compliance with applicable regulations from day 1
Building Trust & Safety infrastructure proactively is one of the best investments a startup can make. It's cheaper, faster, and less stressful than building it reactively during a crisis.
About the Author
Maneesha Pandey is the founder of Echelon Advisory Services, specializing in Trust & Safety, AI Governance, and EU regulatory compliance. She spent 14+ years building Trust & Safety infrastructure at Amazon, Google, and TikTok, including launching TikTok's LATAM Trust & Safety operations from scratch.