דילוג לתוכן
  • חוקי הפורום
  • פופולרי
  • לא נפתר
  • משתמשים
  • חיפוש גוגל בפורום
  • צור קשר
עיצובים
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • ברירת מחדל (ללא עיצוב (ברירת מחדל))
  • ללא עיצוב (ברירת מחדל)
כיווץ
מתמחים טופ
  1. דף הבית
  2. קטגוריות בהרצה
  3. תכנות
  4. בינה מלאכותית - AI
  5. עזרה הדדית - בינה מלאכותית
  6. שיתוף | skill לpdf - מopenai (מתוך תהליך החשיבה של chatgpt).

שיתוף | skill לpdf - מopenai (מתוך תהליך החשיבה של chatgpt).

מתוזמן נעוץ נעול הועבר עזרה הדדית - בינה מלאכותית
3 פוסטים 2 כותבים 139 צפיות 3 עוקבים
  • מהישן לחדש
  • מהחדש לישן
  • הכי הרבה הצבעות
תגובה
  • תגובה כנושא
התחברו כדי לפרסם תגובה
נושא זה נמחק. רק משתמשים עם הרשאות מתאימות יוכלו לצפות בו.
  • י. פל.י מנותק
    י. פל.י מנותק
    י. פל.
    כתב נערך לאחרונה על ידי
    #1

    אני לא מצליח להכניס הכל לתוך בלוק-קוד, משום שהוא נקטע באמצע...
    הכנסתי הכל לספויילר.

    name: pdfs
    description: Reliable, workflow-driven PDF processing: render → verify → operate → re-render/verify, covering reading, inspection, extraction, editing, forms, OCR, redaction, conversion, and diffing. Prefer authoring in DOCX or PPTX (then converting to PDF) for text-heavy docs or slide-like layouts; use ReportLab here for programmatic PDF generation.

    PDF Skill (Read • Inspect • Extract • Edit • Render • Forms • OCR • Redact • Convert • Diff)

    This skill is designed for reliable, workflow-driven PDF work: render -> verify -> operate -> re-render verify.

    Before you touch PDFs: should this be DOCX/PPTX instead?

    Even if the user asks for a PDF deliverable, the best workflow is often:

    • Text-heavy, business-doc layout (headings, TOC, long tables, rich lists) -> use the DOCX skill to author, then convert to PDF with lo_convert_to_pdf.py.
    • Slide-like visual layout (charts, callouts, fixed positioning, figure captions) -> use the Slides skill (PPTX) to author, then export to PDF.
    • Programmatic generation -> ReportLab (this skill) is fine.

    If you find yourself hand-tuning line breaks or typography in ReportLab, you probably picked the wrong authoring format.


    Core loop (always)

    1. Render to images
    python /home/oai/skills/pdfs/scripts/render_pdf.py input.pdf --out_dir /mnt/data/_renders/in --dpi 200
    
    1. Inspect PNGs (tables/figures/layout are authoritative)

    2. Perform the edit/extract/create

    3. Re-render and compare

    python /home/oai/skills/pdfs/scripts/compare_renders.py before.pdf after.pdf --out_dir /mnt/data/_diff --dpi 200
    

    Task index (progressive)

    Start with the smallest task that answers the user:

    Read / review

    • tasks/read_review.md

    Extract (text/layout/tables/images/attachments/forms)

    • tasks/extract.md
    • tasks/coords.md (coordinate sanity)

    Edit (merge/split/rotate/crop/watermark/paginate/encrypt/repair)

    • tasks/edit.md
    • tasks/compare.md (visual regression)

    Forms

    • Fillable forms: tasks/forms_annotations.md
    • Debugging/introspection: tasks/forms_debugging.md
    • Non-fillable / stamping workflow: tasks/forms_nonfillable.md

    OCR

    • tasks/ocr.md

    Preflight / normalize

    • tasks/preflight.md

    Redaction

    • tasks/redact.md

    Renderer parity

    • tasks/parity.md

    Batch processing

    • tasks/batch.md

    Create / convert

    • tasks/create.md
    • tasks/convert.md
    • tasks/js_tools.md (pdf-lib, pdfjs)

    Package map (where things live)

    This pack includes a manifest.txt that is a pure list of relative file paths used by download tooling.

    Quick map:

    • tasks/ (what to do)

      • read_review.md - render-first reading/review
      • extract.md - extract text/layout/tables/images/attachments/forms
      • coords.md - coordinate system cheatsheet (PDF pt vs image px)
      • edit.md - merge/split/select/rotate/crop/watermark/paginate/encrypt/repair
      • compare.md - visual diff workflow
      • forms_annotations.md - fillable forms + appearance pitfalls + correctness checklist
      • forms_debugging.md - widget-level introspection + acceptable values
      • forms_nonfillable.md - stamp-by-boxes workflow for non-fillable forms
      • ocr.md - OCR scanned PDFs to searchable
      • preflight.md - quick triage + normalization guidance
      • redact.md - true redaction workflows
      • parity.md - render parity across engines
      • batch.md - batch helpers for corpora
      • create.md - choose reportlab/latex/html/docx/pptx pipeline
      • convert.md - docx/pptx/html/markdown/latex to PDF conversion
      • js_tools.md - pdf-lib/pdfjs helper CLIs
    • scripts/ (run these)

      • render_pdf.py - render to PNGs (pdfium or poppler)
      • compare_renders.py - render-and-diff two PDFs (pixel diff)
      • pdf_inspect.py - metadata/structure overview
      • pdf_extract.py - text/words/chars/tables/images/attachments/annots/forms
      • pdf_edit.py - editing toolkit (merge/split/select/rotate/crop/watermark/paginate/encrypt/repair/optimize)
      • pdf_preflight.py - preflight/triage warnings
      • pdf_redact.py - true redaction (remove underlying content)
      • renderer_parity.py - diff pdftoppm vs pdfium renders
      • batch_pdf.py - batch runner for common ops
      • box_picker_html.py - generate interactive HTML to pick rectangles -> JSON in PDF coords
      • place_text_by_boxes.py - stamp text/checkmarks into rectangles (non-fillable forms)
      • ocr_pdf.py - OCR wrapper
      • html_to_pdf.py, md_to_pdf.py, latex_to_pdf.py, lo_convert_to_pdf.py - conversion helpers
    • js/ (Node helpers)

      • install_deps.sh - installs pdf-lib + pdfjs-dist
      • fill_form.mjs - fill + optional flatten (supports flags and positional args)
      • extract_form_fields.mjs - list AcroForm fields
      • extract_text_pdfjs.mjs - extract text via pdfjs-dist
    • examples/

      • smoke_test.md - runnable smoke flows
    • troubleshooting/

      • common.md - common pitfalls and fixes

    Final deliverable expectations

    • No clipped text, overlaps, black squares, or broken glyphs in rendered PNGs.
    • Verify in at least one renderer (pdfium or pdftoppm). For tricky forms, verify in two.
    • Remove intermediate artifacts from the deliverable folder (keep only final PDF(s)).
    • Avoid Unicode dashes that some renderers mishandle; prefer ASCII -.

    גאה להיות חלק:
    otzaria.org

    א תגובה 1 תגובה אחרונה
    2
    • י. פל.י י. פל.

      אני לא מצליח להכניס הכל לתוך בלוק-קוד, משום שהוא נקטע באמצע...
      הכנסתי הכל לספויילר.

      name: pdfs
      description: Reliable, workflow-driven PDF processing: render → verify → operate → re-render/verify, covering reading, inspection, extraction, editing, forms, OCR, redaction, conversion, and diffing. Prefer authoring in DOCX or PPTX (then converting to PDF) for text-heavy docs or slide-like layouts; use ReportLab here for programmatic PDF generation.

      PDF Skill (Read • Inspect • Extract • Edit • Render • Forms • OCR • Redact • Convert • Diff)

      This skill is designed for reliable, workflow-driven PDF work: render -> verify -> operate -> re-render verify.

      Before you touch PDFs: should this be DOCX/PPTX instead?

      Even if the user asks for a PDF deliverable, the best workflow is often:

      • Text-heavy, business-doc layout (headings, TOC, long tables, rich lists) -> use the DOCX skill to author, then convert to PDF with lo_convert_to_pdf.py.
      • Slide-like visual layout (charts, callouts, fixed positioning, figure captions) -> use the Slides skill (PPTX) to author, then export to PDF.
      • Programmatic generation -> ReportLab (this skill) is fine.

      If you find yourself hand-tuning line breaks or typography in ReportLab, you probably picked the wrong authoring format.


      Core loop (always)

      1. Render to images
      python /home/oai/skills/pdfs/scripts/render_pdf.py input.pdf --out_dir /mnt/data/_renders/in --dpi 200
      
      1. Inspect PNGs (tables/figures/layout are authoritative)

      2. Perform the edit/extract/create

      3. Re-render and compare

      python /home/oai/skills/pdfs/scripts/compare_renders.py before.pdf after.pdf --out_dir /mnt/data/_diff --dpi 200
      

      Task index (progressive)

      Start with the smallest task that answers the user:

      Read / review

      • tasks/read_review.md

      Extract (text/layout/tables/images/attachments/forms)

      • tasks/extract.md
      • tasks/coords.md (coordinate sanity)

      Edit (merge/split/rotate/crop/watermark/paginate/encrypt/repair)

      • tasks/edit.md
      • tasks/compare.md (visual regression)

      Forms

      • Fillable forms: tasks/forms_annotations.md
      • Debugging/introspection: tasks/forms_debugging.md
      • Non-fillable / stamping workflow: tasks/forms_nonfillable.md

      OCR

      • tasks/ocr.md

      Preflight / normalize

      • tasks/preflight.md

      Redaction

      • tasks/redact.md

      Renderer parity

      • tasks/parity.md

      Batch processing

      • tasks/batch.md

      Create / convert

      • tasks/create.md
      • tasks/convert.md
      • tasks/js_tools.md (pdf-lib, pdfjs)

      Package map (where things live)

      This pack includes a manifest.txt that is a pure list of relative file paths used by download tooling.

      Quick map:

      • tasks/ (what to do)

        • read_review.md - render-first reading/review
        • extract.md - extract text/layout/tables/images/attachments/forms
        • coords.md - coordinate system cheatsheet (PDF pt vs image px)
        • edit.md - merge/split/select/rotate/crop/watermark/paginate/encrypt/repair
        • compare.md - visual diff workflow
        • forms_annotations.md - fillable forms + appearance pitfalls + correctness checklist
        • forms_debugging.md - widget-level introspection + acceptable values
        • forms_nonfillable.md - stamp-by-boxes workflow for non-fillable forms
        • ocr.md - OCR scanned PDFs to searchable
        • preflight.md - quick triage + normalization guidance
        • redact.md - true redaction workflows
        • parity.md - render parity across engines
        • batch.md - batch helpers for corpora
        • create.md - choose reportlab/latex/html/docx/pptx pipeline
        • convert.md - docx/pptx/html/markdown/latex to PDF conversion
        • js_tools.md - pdf-lib/pdfjs helper CLIs
      • scripts/ (run these)

        • render_pdf.py - render to PNGs (pdfium or poppler)
        • compare_renders.py - render-and-diff two PDFs (pixel diff)
        • pdf_inspect.py - metadata/structure overview
        • pdf_extract.py - text/words/chars/tables/images/attachments/annots/forms
        • pdf_edit.py - editing toolkit (merge/split/select/rotate/crop/watermark/paginate/encrypt/repair/optimize)
        • pdf_preflight.py - preflight/triage warnings
        • pdf_redact.py - true redaction (remove underlying content)
        • renderer_parity.py - diff pdftoppm vs pdfium renders
        • batch_pdf.py - batch runner for common ops
        • box_picker_html.py - generate interactive HTML to pick rectangles -> JSON in PDF coords
        • place_text_by_boxes.py - stamp text/checkmarks into rectangles (non-fillable forms)
        • ocr_pdf.py - OCR wrapper
        • html_to_pdf.py, md_to_pdf.py, latex_to_pdf.py, lo_convert_to_pdf.py - conversion helpers
      • js/ (Node helpers)

        • install_deps.sh - installs pdf-lib + pdfjs-dist
        • fill_form.mjs - fill + optional flatten (supports flags and positional args)
        • extract_form_fields.mjs - list AcroForm fields
        • extract_text_pdfjs.mjs - extract text via pdfjs-dist
      • examples/

        • smoke_test.md - runnable smoke flows
      • troubleshooting/

        • common.md - common pitfalls and fixes

      Final deliverable expectations

      • No clipped text, overlaps, black squares, or broken glyphs in rendered PNGs.
      • Verify in at least one renderer (pdfium or pdftoppm). For tricky forms, verify in two.
      • Remove intermediate artifacts from the deliverable folder (keep only final PDF(s)).
      • Avoid Unicode dashes that some renderers mishandle; prefer ASCII -.

      א מנותק
      א מנותק
      ארי
      כתב נערך לאחרונה על ידי
      #2

      @י.-פל. מה המטרה
      יצירת סקיל?

      י. פל.י תגובה 1 תגובה אחרונה
      0
      • א ארי

        @י.-פל. מה המטרה
        יצירת סקיל?

        י. פל.י מנותק
        י. פל.י מנותק
        י. פל.
        כתב נערך לאחרונה על ידי
        #3

        @ארי
        גם.
        וגם היכרות עם היכולות שלו.

        גאה להיות חלק:
        otzaria.org

        תגובה 1 תגובה אחרונה
        1

        • התחברות

        • אין לך חשבון עדיין? הרשמה

        • התחברו או הירשמו כדי לחפש.
        • פוסט ראשון
          פוסט אחרון
        0
        • חוקי הפורום
        • פופולרי
        • לא נפתר
        • משתמשים
        • חיפוש גוגל בפורום
        • צור קשר