שיתוף | skill לpdf - מopenai (מתוך תהליך החשיבה של chatgpt).

י. פל.

אני לא מצליח להכניס הכל לתוך בלוק-קוד, משום שהוא נקטע באמצע...
הכנסתי הכל לספויילר.

name: pdfs
description: Reliable, workflow-driven PDF processing: render → verify → operate → re-render/verify, covering reading, inspection, extraction, editing, forms, OCR, redaction, conversion, and diffing. Prefer authoring in DOCX or PPTX (then converting to PDF) for text-heavy docs or slide-like layouts; use ReportLab here for programmatic PDF generation.

PDF Skill (Read • Inspect • Extract • Edit • Render • Forms • OCR • Redact • Convert • Diff)

This skill is designed for reliable, workflow-driven PDF work: render -> verify -> operate -> re-render verify.

Before you touch PDFs: should this be DOCX/PPTX instead?

Even if the user asks for a PDF deliverable, the best workflow is often:

Text-heavy, business-doc layout (headings, TOC, long tables, rich lists) -> use the DOCX skill to author, then convert to PDF with lo_convert_to_pdf.py.
Slide-like visual layout (charts, callouts, fixed positioning, figure captions) -> use the Slides skill (PPTX) to author, then export to PDF.
Programmatic generation -> ReportLab (this skill) is fine.

If you find yourself hand-tuning line breaks or typography in ReportLab, you probably picked the wrong authoring format.

Core loop (always)

Render to images

python /home/oai/skills/pdfs/scripts/render_pdf.py input.pdf --out_dir /mnt/data/_renders/in --dpi 200

Inspect PNGs (tables/figures/layout are authoritative)
Perform the edit/extract/create
Re-render and compare

python /home/oai/skills/pdfs/scripts/compare_renders.py before.pdf after.pdf --out_dir /mnt/data/_diff --dpi 200

Task index (progressive)

Start with the smallest task that answers the user:

Read / review

tasks/read_review.md

Extract (text/layout/tables/images/attachments/forms)

tasks/extract.md
tasks/coords.md (coordinate sanity)

Edit (merge/split/rotate/crop/watermark/paginate/encrypt/repair)

tasks/edit.md
tasks/compare.md (visual regression)

Forms

Fillable forms: tasks/forms_annotations.md
Debugging/introspection: tasks/forms_debugging.md
Non-fillable / stamping workflow: tasks/forms_nonfillable.md

OCR

tasks/ocr.md

Preflight / normalize

tasks/preflight.md

Redaction

tasks/redact.md

Renderer parity

tasks/parity.md

Batch processing

tasks/batch.md

Create / convert

tasks/create.md
tasks/convert.md
tasks/js_tools.md (pdf-lib, pdfjs)

Package map (where things live)

This pack includes a manifest.txt that is a pure list of relative file paths used by download tooling.

Quick map:

tasks/ (what to do)
- read_review.md - render-first reading/review
- extract.md - extract text/layout/tables/images/attachments/forms
- coords.md - coordinate system cheatsheet (PDF pt vs image px)
- edit.md - merge/split/select/rotate/crop/watermark/paginate/encrypt/repair
- compare.md - visual diff workflow
- forms_annotations.md - fillable forms + appearance pitfalls + correctness checklist
- forms_debugging.md - widget-level introspection + acceptable values
- forms_nonfillable.md - stamp-by-boxes workflow for non-fillable forms
- ocr.md - OCR scanned PDFs to searchable
- preflight.md - quick triage + normalization guidance
- redact.md - true redaction workflows
- parity.md - render parity across engines
- batch.md - batch helpers for corpora
- create.md - choose reportlab/latex/html/docx/pptx pipeline
- convert.md - docx/pptx/html/markdown/latex to PDF conversion
- js_tools.md - pdf-lib/pdfjs helper CLIs
scripts/ (run these)
- render_pdf.py - render to PNGs (pdfium or poppler)
- compare_renders.py - render-and-diff two PDFs (pixel diff)
- pdf_inspect.py - metadata/structure overview
- pdf_extract.py - text/words/chars/tables/images/attachments/annots/forms
- pdf_edit.py - editing toolkit (merge/split/select/rotate/crop/watermark/paginate/encrypt/repair/optimize)
- pdf_preflight.py - preflight/triage warnings
- pdf_redact.py - true redaction (remove underlying content)
- renderer_parity.py - diff pdftoppm vs pdfium renders
- batch_pdf.py - batch runner for common ops
- box_picker_html.py - generate interactive HTML to pick rectangles -> JSON in PDF coords
- place_text_by_boxes.py - stamp text/checkmarks into rectangles (non-fillable forms)
- ocr_pdf.py - OCR wrapper
- html_to_pdf.py, md_to_pdf.py, latex_to_pdf.py, lo_convert_to_pdf.py - conversion helpers
js/ (Node helpers)
- install_deps.sh - installs pdf-lib + pdfjs-dist
- fill_form.mjs - fill + optional flatten (supports flags and positional args)
- extract_form_fields.mjs - list AcroForm fields
- extract_text_pdfjs.mjs - extract text via pdfjs-dist
examples/
- smoke_test.md - runnable smoke flows
troubleshooting/
- common.md - common pitfalls and fixes

Final deliverable expectations

No clipped text, overlaps, black squares, or broken glyphs in rendered PNGs.
Verify in at least one renderer (pdfium or pdftoppm). For tricky forms, verify in two.
Remove intermediate artifacts from the deliverable folder (keep only final PDF(s)).
Avoid Unicode dashes that some renderers mishandle; prefer ASCII -.

ארי

@י.-פל. מה המטרה
יצירת סקיל?

י. פל.

@ארי
גם.
וגם היכרות עם היכולות שלו.