שיתוף | skill לpdf - מopenai (מתוך תהליך החשיבה של chatgpt).
-
אני לא מצליח להכניס הכל לתוך בלוק-קוד, משום שהוא נקטע באמצע...
הכנסתי הכל לספויילר.name: pdfs
description: Reliable, workflow-driven PDF processing: render → verify → operate → re-render/verify, covering reading, inspection, extraction, editing, forms, OCR, redaction, conversion, and diffing. Prefer authoring in DOCX or PPTX (then converting to PDF) for text-heavy docs or slide-like layouts; use ReportLab here for programmatic PDF generation.PDF Skill (Read • Inspect • Extract • Edit • Render • Forms • OCR • Redact • Convert • Diff)
This skill is designed for reliable, workflow-driven PDF work: render -> verify -> operate -> re-render verify.
Before you touch PDFs: should this be DOCX/PPTX instead?
Even if the user asks for a PDF deliverable, the best workflow is often:
- Text-heavy, business-doc layout (headings, TOC, long tables, rich lists) -> use the DOCX skill to author, then convert to PDF with
lo_convert_to_pdf.py. - Slide-like visual layout (charts, callouts, fixed positioning, figure captions) -> use the Slides skill (PPTX) to author, then export to PDF.
- Programmatic generation -> ReportLab (this skill) is fine.
If you find yourself hand-tuning line breaks or typography in ReportLab, you probably picked the wrong authoring format.
Core loop (always)
- Render to images
python /home/oai/skills/pdfs/scripts/render_pdf.py input.pdf --out_dir /mnt/data/_renders/in --dpi 200-
Inspect PNGs (tables/figures/layout are authoritative)
-
Perform the edit/extract/create
-
Re-render and compare
python /home/oai/skills/pdfs/scripts/compare_renders.py before.pdf after.pdf --out_dir /mnt/data/_diff --dpi 200
Task index (progressive)
Start with the smallest task that answers the user:
Read / review
tasks/read_review.md
Extract (text/layout/tables/images/attachments/forms)
tasks/extract.mdtasks/coords.md(coordinate sanity)
Edit (merge/split/rotate/crop/watermark/paginate/encrypt/repair)
tasks/edit.mdtasks/compare.md(visual regression)
Forms
- Fillable forms:
tasks/forms_annotations.md - Debugging/introspection:
tasks/forms_debugging.md - Non-fillable / stamping workflow:
tasks/forms_nonfillable.md
OCR
tasks/ocr.md
Preflight / normalize
tasks/preflight.md
Redaction
tasks/redact.md
Renderer parity
tasks/parity.md
Batch processing
tasks/batch.md
Create / convert
tasks/create.mdtasks/convert.mdtasks/js_tools.md(pdf-lib, pdfjs)
Package map (where things live)
This pack includes a
manifest.txtthat is a pure list of relative file paths used by download tooling.Quick map:
-
tasks/ (what to do)
read_review.md- render-first reading/reviewextract.md- extract text/layout/tables/images/attachments/formscoords.md- coordinate system cheatsheet (PDF pt vs image px)edit.md- merge/split/select/rotate/crop/watermark/paginate/encrypt/repaircompare.md- visual diff workflowforms_annotations.md- fillable forms + appearance pitfalls + correctness checklistforms_debugging.md- widget-level introspection + acceptable valuesforms_nonfillable.md- stamp-by-boxes workflow for non-fillable formsocr.md- OCR scanned PDFs to searchablepreflight.md- quick triage + normalization guidanceredact.md- true redaction workflowsparity.md- render parity across enginesbatch.md- batch helpers for corporacreate.md- choose reportlab/latex/html/docx/pptx pipelineconvert.md- docx/pptx/html/markdown/latex to PDF conversionjs_tools.md- pdf-lib/pdfjs helper CLIs
-
scripts/ (run these)
render_pdf.py- render to PNGs (pdfium or poppler)compare_renders.py- render-and-diff two PDFs (pixel diff)pdf_inspect.py- metadata/structure overviewpdf_extract.py- text/words/chars/tables/images/attachments/annots/formspdf_edit.py- editing toolkit (merge/split/select/rotate/crop/watermark/paginate/encrypt/repair/optimize)pdf_preflight.py- preflight/triage warningspdf_redact.py- true redaction (remove underlying content)renderer_parity.py- diff pdftoppm vs pdfium rendersbatch_pdf.py- batch runner for common opsbox_picker_html.py- generate interactive HTML to pick rectangles -> JSON in PDF coordsplace_text_by_boxes.py- stamp text/checkmarks into rectangles (non-fillable forms)ocr_pdf.py- OCR wrapperhtml_to_pdf.py,md_to_pdf.py,latex_to_pdf.py,lo_convert_to_pdf.py- conversion helpers
-
js/ (Node helpers)
install_deps.sh- installs pdf-lib + pdfjs-distfill_form.mjs- fill + optional flatten (supports flags and positional args)extract_form_fields.mjs- list AcroForm fieldsextract_text_pdfjs.mjs- extract text via pdfjs-dist
-
examples/
smoke_test.md- runnable smoke flows
-
troubleshooting/
common.md- common pitfalls and fixes
Final deliverable expectations
- No clipped text, overlaps, black squares, or broken glyphs in rendered PNGs.
- Verify in at least one renderer (
pdfiumorpdftoppm). For tricky forms, verify in two. - Remove intermediate artifacts from the deliverable folder (keep only final PDF(s)).
- Avoid Unicode dashes that some renderers mishandle; prefer ASCII
-.
- Text-heavy, business-doc layout (headings, TOC, long tables, rich lists) -> use the DOCX skill to author, then convert to PDF with
-
אני לא מצליח להכניס הכל לתוך בלוק-קוד, משום שהוא נקטע באמצע...
הכנסתי הכל לספויילר.name: pdfs
description: Reliable, workflow-driven PDF processing: render → verify → operate → re-render/verify, covering reading, inspection, extraction, editing, forms, OCR, redaction, conversion, and diffing. Prefer authoring in DOCX or PPTX (then converting to PDF) for text-heavy docs or slide-like layouts; use ReportLab here for programmatic PDF generation.PDF Skill (Read • Inspect • Extract • Edit • Render • Forms • OCR • Redact • Convert • Diff)
This skill is designed for reliable, workflow-driven PDF work: render -> verify -> operate -> re-render verify.
Before you touch PDFs: should this be DOCX/PPTX instead?
Even if the user asks for a PDF deliverable, the best workflow is often:
- Text-heavy, business-doc layout (headings, TOC, long tables, rich lists) -> use the DOCX skill to author, then convert to PDF with
lo_convert_to_pdf.py. - Slide-like visual layout (charts, callouts, fixed positioning, figure captions) -> use the Slides skill (PPTX) to author, then export to PDF.
- Programmatic generation -> ReportLab (this skill) is fine.
If you find yourself hand-tuning line breaks or typography in ReportLab, you probably picked the wrong authoring format.
Core loop (always)
- Render to images
python /home/oai/skills/pdfs/scripts/render_pdf.py input.pdf --out_dir /mnt/data/_renders/in --dpi 200-
Inspect PNGs (tables/figures/layout are authoritative)
-
Perform the edit/extract/create
-
Re-render and compare
python /home/oai/skills/pdfs/scripts/compare_renders.py before.pdf after.pdf --out_dir /mnt/data/_diff --dpi 200
Task index (progressive)
Start with the smallest task that answers the user:
Read / review
tasks/read_review.md
Extract (text/layout/tables/images/attachments/forms)
tasks/extract.mdtasks/coords.md(coordinate sanity)
Edit (merge/split/rotate/crop/watermark/paginate/encrypt/repair)
tasks/edit.mdtasks/compare.md(visual regression)
Forms
- Fillable forms:
tasks/forms_annotations.md - Debugging/introspection:
tasks/forms_debugging.md - Non-fillable / stamping workflow:
tasks/forms_nonfillable.md
OCR
tasks/ocr.md
Preflight / normalize
tasks/preflight.md
Redaction
tasks/redact.md
Renderer parity
tasks/parity.md
Batch processing
tasks/batch.md
Create / convert
tasks/create.mdtasks/convert.mdtasks/js_tools.md(pdf-lib, pdfjs)
Package map (where things live)
This pack includes a
manifest.txtthat is a pure list of relative file paths used by download tooling.Quick map:
-
tasks/ (what to do)
read_review.md- render-first reading/reviewextract.md- extract text/layout/tables/images/attachments/formscoords.md- coordinate system cheatsheet (PDF pt vs image px)edit.md- merge/split/select/rotate/crop/watermark/paginate/encrypt/repaircompare.md- visual diff workflowforms_annotations.md- fillable forms + appearance pitfalls + correctness checklistforms_debugging.md- widget-level introspection + acceptable valuesforms_nonfillable.md- stamp-by-boxes workflow for non-fillable formsocr.md- OCR scanned PDFs to searchablepreflight.md- quick triage + normalization guidanceredact.md- true redaction workflowsparity.md- render parity across enginesbatch.md- batch helpers for corporacreate.md- choose reportlab/latex/html/docx/pptx pipelineconvert.md- docx/pptx/html/markdown/latex to PDF conversionjs_tools.md- pdf-lib/pdfjs helper CLIs
-
scripts/ (run these)
render_pdf.py- render to PNGs (pdfium or poppler)compare_renders.py- render-and-diff two PDFs (pixel diff)pdf_inspect.py- metadata/structure overviewpdf_extract.py- text/words/chars/tables/images/attachments/annots/formspdf_edit.py- editing toolkit (merge/split/select/rotate/crop/watermark/paginate/encrypt/repair/optimize)pdf_preflight.py- preflight/triage warningspdf_redact.py- true redaction (remove underlying content)renderer_parity.py- diff pdftoppm vs pdfium rendersbatch_pdf.py- batch runner for common opsbox_picker_html.py- generate interactive HTML to pick rectangles -> JSON in PDF coordsplace_text_by_boxes.py- stamp text/checkmarks into rectangles (non-fillable forms)ocr_pdf.py- OCR wrapperhtml_to_pdf.py,md_to_pdf.py,latex_to_pdf.py,lo_convert_to_pdf.py- conversion helpers
-
js/ (Node helpers)
install_deps.sh- installs pdf-lib + pdfjs-distfill_form.mjs- fill + optional flatten (supports flags and positional args)extract_form_fields.mjs- list AcroForm fieldsextract_text_pdfjs.mjs- extract text via pdfjs-dist
-
examples/
smoke_test.md- runnable smoke flows
-
troubleshooting/
common.md- common pitfalls and fixes
Final deliverable expectations
- No clipped text, overlaps, black squares, or broken glyphs in rendered PNGs.
- Verify in at least one renderer (
pdfiumorpdftoppm). For tricky forms, verify in two. - Remove intermediate artifacts from the deliverable folder (keep only final PDF(s)).
- Avoid Unicode dashes that some renderers mishandle; prefer ASCII
-.
- Text-heavy, business-doc layout (headings, TOC, long tables, rich lists) -> use the DOCX skill to author, then convert to PDF with
שלום! נראה שהשיחה הזו מעניינת אותך, אבל עדיין אין לך חשבון.
נמאס לכם לגלול בין אותם הפוסטים בכל ביקור? כשנרשמים לחשבון, תמיד תחזרו בדיוק למקום שבו הייתם קודם, ותוכלו לבחור לקבל התראות על תגובות חדשות (בין אם במייל, ובין אם בהתראת פוש). תוכלו גם לשמור סימניות ולפרגן ב-upvote לפוסטים כדי להביע הערכה לחברי קהילה אחרים.
בעזרת התרומה שלך, הפוסט הזה יכול להיות אפילו טוב יותר 💗
הרשמה התחברות