church-api/README_HTML_CLEANING.md
Benjamin Slingo 0c06e159bb Initial commit: Church API Rust implementation
Complete church management system with bulletin management, media processing, live streaming integration, and web interface. Includes authentication, email notifications, database migrations, and comprehensive test suite.
2025-08-19 20:56:41 -04:00

90 lines
2.9 KiB
Markdown

# HTML Entity Cleaning Tool
This tool permanently cleans HTML entities and tags from all text fields in the database.
## Quick Start
```bash
# Set your database URL (if not already set)
export DATABASE_URL="postgresql://user:pass@localhost/church_api"
# Run the cleaning tool
cargo run --bin clean-html-entities
```
## What it does
🧹 **Removes HTML tags**: `<p>`, `<div>`, `<strong>`, etc.
🔧 **Converts HTML entities**:
- `&nbsp;` → space
- `&amp;``&`
- `&lt;``<`
- `&gt;``>`
- `&quot;``"`
- `&#39;``'`
## Tables cleaned
**bulletins**: title, sabbath_school, divine_worship, scripture_reading, sunset
**events**: title, description, location, location_url, approved_from
**pending_events**: title, description, location, location_url, admin_notes, submitter_email, bulletin_week
**members**: first_name, last_name, address, notes, emergency_contact_name, membership_status
**church_config**: church_name, contact_email, church_address, po_box, google_maps_url, about_text
**users**: username, email, name, avatar_url, role
**media_items**: title, speaker, description, scripture_reading (if table exists)
**transcoded_media**: error_message, transcoding_method (if table exists)
## Safety features
-**Smart**: Only processes records that actually need cleaning
- 📊 **Informative**: Shows exactly how many records were cleaned
- 🔍 **Verification**: Counts dirty records before and after
- ⏱️ **Fast**: Uses existing sanitization functions from your codebase
## Example output
```
🧹 Church API - HTML Entity Cleaning Tool
==========================================
📡 Connecting to database...
✅ Connected successfully!
🔍 Analyzing database for HTML entities...
📊 Found 23 records with HTML tags or entities
🚀 Starting HTML entity cleanup...
🔧 Cleaning bulletins table...
✅ Cleaned 5 bulletin records
🔧 Cleaning events table...
✅ Cleaned 12 event records
🔧 Cleaning pending_events table...
✅ Cleaned 3 pending event records
🔧 Cleaning members table...
✅ Cleaned 2 member records
🔧 Cleaning church_config table...
✅ Cleaned 1 church config records
🔧 Cleaning users table...
✅ Cleaned 0 user records
🔧 Cleaning media_items table...
✅ Cleaned 0 media item records
🔧 Cleaning transcoded_media table...
✅ Cleaned 0 transcoded media records
🎉 Cleanup completed!
📊 Total records cleaned: 23
⏱️ Duration: 145ms
🔍 Verifying cleanup...
✅ Success! No HTML entities remaining in database.
```
## Benefits after running
🚀 **Faster API responses** - No more cleaning on every request
🔒 **Clean database** - All text data is now pure and clean
📊 **Better queries** - Direct database queries return clean data
🛡️ **Complete solution** - Works with the existing API sanitization
Your API will now return completely clean data with no HTML entities! 🎉