church-api/README_HTML_CLEANING.md
Benjamin Slingo 0c06e159bb Initial commit: Church API Rust implementation
Complete church management system with bulletin management, media processing, live streaming integration, and web interface. Includes authentication, email notifications, database migrations, and comprehensive test suite.
2025-08-19 20:56:41 -04:00

2.9 KiB

HTML Entity Cleaning Tool

This tool permanently cleans HTML entities and tags from all text fields in the database.

Quick Start

# Set your database URL (if not already set)
export DATABASE_URL="postgresql://user:pass@localhost/church_api"

# Run the cleaning tool
cargo run --bin clean-html-entities

What it does

🧹 Removes HTML tags: <p>, <div>, <strong>, etc. 🔧 Converts HTML entities:

  • &nbsp; → space
  • &amp;&
  • &lt;<
  • &gt;>
  • &quot;"
  • &#39;'

Tables cleaned

bulletins: title, sabbath_school, divine_worship, scripture_reading, sunset events: title, description, location, location_url, approved_from
pending_events: title, description, location, location_url, admin_notes, submitter_email, bulletin_week members: first_name, last_name, address, notes, emergency_contact_name, membership_status church_config: church_name, contact_email, church_address, po_box, google_maps_url, about_text users: username, email, name, avatar_url, role media_items: title, speaker, description, scripture_reading (if table exists) transcoded_media: error_message, transcoding_method (if table exists)

Safety features

  • Smart: Only processes records that actually need cleaning
  • 📊 Informative: Shows exactly how many records were cleaned
  • 🔍 Verification: Counts dirty records before and after
  • ⏱️ Fast: Uses existing sanitization functions from your codebase

Example output

🧹 Church API - HTML Entity Cleaning Tool
==========================================

📡 Connecting to database...
✅ Connected successfully!

🔍 Analyzing database for HTML entities...
📊 Found 23 records with HTML tags or entities

🚀 Starting HTML entity cleanup...

🔧 Cleaning bulletins table...
   ✅ Cleaned 5 bulletin records
🔧 Cleaning events table...
   ✅ Cleaned 12 event records
🔧 Cleaning pending_events table...
   ✅ Cleaned 3 pending event records
🔧 Cleaning members table...
   ✅ Cleaned 2 member records
🔧 Cleaning church_config table...
   ✅ Cleaned 1 church config records
🔧 Cleaning users table...
   ✅ Cleaned 0 user records
🔧 Cleaning media_items table...
   ✅ Cleaned 0 media item records
🔧 Cleaning transcoded_media table...
   ✅ Cleaned 0 transcoded media records

🎉 Cleanup completed!
📊 Total records cleaned: 23
⏱️  Duration: 145ms

🔍 Verifying cleanup...
✅ Success! No HTML entities remaining in database.

Benefits after running

🚀 Faster API responses - No more cleaning on every request 🔒 Clean database - All text data is now pure and clean 📊 Better queries - Direct database queries return clean data 🛡️ Complete solution - Works with the existing API sanitization

Your API will now return completely clean data with no HTML entities! 🎉