RemoveSegID / SegIDClean & Reflow Instructions — ATR Last-Mile Cleanup Utility v1.3


BATCH 71 ROUTING NOTE — WITNESS/SĀKṢĪ + RESIDUE/CALQUE HARDENING — 1 June 2026

This prompt is not primarily a translation-review prompt. Do not overload it with full doctrinal rewriting. However, any artifact it formats, cleans, packages, audits, hands off, or approves must preserve the Batch71 inherited hard gates: Universal Witness / Sākṣī Gate, visible link-text classification, romanized Sanskrit/Pāli/Tibetan residue classification, target-specific hybrid residue scans, Prompt 9 calque/fluency review, repeated-failure escalation, high-risk source-term review table, Strict HTML QA visible-text smoke test, exact href/src/code preservation, and hard no-repair promotion discipline. If translation, terminology, residue, or target-language fluency issues are detected or even suspected, route to Prompt 6 and Prompt 9 before any final/publishable status. Structural parity alone is not publishability.

BATCH70 GLOBAL INHERITED MODULE — NO QA-BY-ASSERTION + HARD NO-REPAIR PROMOTION GATE + SOURCE-LANGUAGE RESIDUE REGRESSION AUDIT — 29 May 2026

This is an inherited managed-prompt module. It applies to every ATR managed prompt that can create, translate, edit, review, repair, format, audit, publish, upload, hand off, or package artifacts. It does not update the user's standalone intro prompt.

1. NO QA-BY-ASSERTION
Do not claim “reviewed,” “publishable,” “final,” “complete,” “material gates pass,” “ready,” “certified,” or equivalent status merely because you say Prompt A / Prompt 1 / Prompt 6 / Prompt 9 / Strict HTML QA was applied. A status claim is valid only after a fresh audit of the exact latest artifact. If a check was not actually performed, state: “Not fully verified: [specific check]. Human review required.” Do not substitute confidence language for verification.

2. EXACT-LATEST-ARTIFACT DISCIPLINE
Every repair, review, audit, final, or no-repair promotion pass must reopen and operate on the exact latest HTML/text artifact. Do not rely on memory, previous summaries, prior QA notes, filenames alone, earlier artifacts, or partially updated drafts. If the latest artifact is ambiguous, do not promote; identify the artifact used and list the ambiguity as a human-review item.

3. MATERIAL REPAIR INVALIDATES SAME-PASS PROMOTION
If any material edit is made during a pass, the artifact cannot be promoted in that same pass. Material edits include translation correction, source-language residue cleanup, mixed-language hybrid repair, href/src repair, URL-slug repair, omitted-section restoration, duplicated-section removal, doctrinal terminology repair, speaker-attribution repair, readability-affecting punctuation repair, paragraph/list/blockquote/section restoration, HTML structure repair, media/embed repair, attribute repair, title/link-text repair, or CSS/wrapper repair that changes rendered output.
If material repair occurred, use exactly: REPAIRED AFTER RE-AUDIT — material gates currently pass, no-repair promotion still required.
Only a later separate pass that reopens the exact latest repaired artifact and makes no material edits may use: reviewed final / publishable — strict-certified final not claimed.

4. PROTECTED-ENGLISH WHITELIST BEFORE RESIDUE SCANS
Before scanning for source-language residue, prepare a protected-English whitelist. English may remain only when it is: URL/code; HTML/CSS/JS attribute name, selector, class, ID, property, function, variable, or config value; proper name or author/person/group name; exact linked article/book/video/audio title intentionally preserved; quoted technical label intentionally preserved (e.g. I AM, AMness, anatta, anatman, Dzogchen, Mahamudra, Brahman, Sunyata, Maha, self/Self, Presence, Awareness, One Mind, No Mind, no-mind, non-dual, rigpa, Dharma, Buddhadharma, or another explicitly protected AtR/Dharma label); bilingual/navigation label intentionally retained; or a quoted original phrase that the source requires preserving. Everything else is ordinary English residue and must be translated or explicitly flagged for human review. Do not overprotect ordinary lowercase uses of otherwise protected terms.

5. MANDATORY SOURCE-LANGUAGE RESIDUE + HYBRID SCAN
Before any no-repair promotion, scan visible text outside code/style/script/href/src for ordinary English embedded in target-language prose; English + target-language hybrid particles/suffixes such as term-এর, term-কে, term-তে, term and term, term বা and equivalent patterns in other languages; lowercase ordinary English words including experience, realization, insight, practice, view, emptiness, awareness, consciousness, manifestation, phenomena, absolute, ultimate, mind, presence, sound, taste, vivid, ontological, dualistic, non-conceptual, meditation, teaching, guide, path, ground, source, self, no-self, non-self, dependent, origination, arising, luminosity, clarity, naturalness, ordinariness, spontaneity, subject, object, agent, doer, doership; corrupted mixed-script words; exact-title damage; residual source punctuation; and line-merge damage. If any material residue is found and repaired, no promotion is allowed in the same pass.

6. HREF/SRC AND ATTRIBUTE INTEGRITY GATE
Before promotion, compare source and target HTML for raw href count, raw src count, exact href/src values, non-source non-Latin characters inside href/src values, spaces inserted into URLs, translated URL slugs, missing/duplicated links, changed iframe/script/config values, and explicitly accounted added links. Never translate or mutate URL paths, href values, src values, iframe config values, CSS URLs, script values, IDs, classes, selectors, Blogger widget IDs, or code-like attributes. Human-facing alt/title/aria-label/placeholder/iframe-title text may be translated only when it is truly user-facing language and not code, URL, exact title, proper name, or protected label.

7. COVERAGE AND STRUCTURE GATE
Before promotion, compare source and target for major section/stage count, heading sequence/order, paragraph count with accounted additions only, blockquote count, list-item count, dialogue/speaker-turn count where applicable, table and row/column preservation, media/embed count, article ending reached, style block closed before article body, no hidden untranslated source tail, no duplicated translated blocks, no dropped links/parenthetical notes/blockquotes/list items, and no CSS/script/code translated or damaged. Missing, duplicated, reordered, or structurally corrupted sections are material defects.

8. RISK-BASKET REGRESSION
Every discovered defect becomes a session-specific risk-basket item. The next review pass must scan globally for the same defect type: all English+target suffix hybrids after one hybrid is found; all href/src/action/data-url/style URLs after one URL mutation is found; all exact linked titles after one title is partially translated; all section/paragraph/list/blockquote coverage after one dropped paragraph; ordinary lowercase uses after one overprotected term; related residue terms after one residue phrase; and all speaker-turn/order checks after one attribution issue.

9. NO-REPAIR PROMOTION CHECKLIST
A no-repair promotion pass must reopen the exact latest artifact; run visible-text residue scan; classify protected English; run href/src parity and non-Latin-in-URL scan; run translated URL-slug scan; run section/heading/paragraph/list/blockquote/media coverage checks; run dialogue/speaker-turn checks where applicable; run the terminology risk basket; run line-merge/punctuation audit; make no material edits; and state remaining exceptions such as URL_NEEDED_HUMAN_REVIEW. Only then may the artifact be marked: reviewed final / publishable — strict-certified final not claimed.

10. PACKAGING / HANDOFF STATUS
Package metadata and handoff prompts must state whether the artifact is “REPAIRED AFTER RE-AUDIT — material gates currently pass, no-repair promotion still required” or “reviewed final / publishable — strict-certified final not claimed.” Filenames must not imply finality if the artifact is repaired-only. Known human-review placeholders such as URL_NEEDED_HUMAN_REVIEW must be listed in QA metadata. Packaging must include a QA/changelog TXT explaining checks performed and exceptions remaining.

Batch 16 Modernization Date: 28 April 2026
Batch 67 Dialogue-Block Preservation Update: 23 May 2026
Status: Live operational instruction section
BATCH 67 DIALOGUE-BLOCK PRESERVATION WARNING

During last-mile cleanup:
- Do not flatten .dialogue-turn-block.
- Do not convert inner paragraphs of .dialogue-turn-block into independent .dialogue-turn speaker boxes.
- Do not remove the first speaker label from a .dialogue-turn-block.
- Do not detach URLs, blockquotes, or numbered questions from their parent speaker block.


PURPOSE
Use this section when cleaning text exported from ATR translation/review prompts that include SegID headers, PARA markers, Clean Copy banners, QA headings, or citation crumbs.

This is a last-mile cleanup workflow. It must not be used to rewrite, translate, summarize, or alter the meaning of the translated text.

LEGACY PRESERVED QUICK WORKFLOW
The older workflow instructed users to:

1. Copy the reviewed Clean Copy output.
2. Paste it into a Notepad file.
3. Save it as a UTF-8 .txt file.
4. Unzip and run RemoveSegID.exe from RemoveSegID.zip.
5. Choose Mode 3 when using that older tool.
6. Drag and drop the .txt file into the console window instead of manually typing the path.
7. Press Enter.
8. Use the cleaned output file created next to the source file, such as filename_cleaned.txt.

CURRENT SEGIDCLEAN & REFLOW WORKFLOW
The newer SegIDClean & Reflow utility is preferred when available.

Expected function:

- removes SegID headers/prefixes, with or without trailing dot or colon;
- strips “Clean Copy” banners and QA/report headings;
- deletes inline citation crumbs such as [ISO][1] or Wikipedia+2 when they are artifacts, not source text;
- preserves paragraph breaks using PARA/SegID markers;
- collapses extra blank lines;
- writes a cleaned file next to the source, such as draft.cleaned.txt;
- if PARA markers are present, emits a second pass with original paragraphing reconstituted as continuous prose, such as draft.cleaned.reflowed.txt.

SAFETY BOUNDARY
The cleanup utility edits exported text only. It must never be treated as an authority over the source translation.

Do not use cleanup to:

- fix mistranslations;
- remove difficult content;
- merge sections that should remain separate;
- erase original paragraphing without a reflow basis;
- delete quotations or source labels;
- remove footnotes that are real article content;
- remove bracketed translator notes such as 【译按：…】 when they are intended content;
- remove original-script quotations;
- remove intentional stage labels, speaker labels, or headings.

BEFORE CLEANUP
Confirm:

1. The translation/review output is complete.
2. The Clean Copy region is the intended region to clean.
3. The file is saved as UTF-8 text.
4. The file extension is .txt unless the tool explicitly supports another format.
5. You have a copy of the pre-cleaned file.
6. PARA markers, if present, are consistent.
7. SegID markers are not part of meaningful source text.

AFTER CLEANUP QA
Open the cleaned file and check:

- title is present;
- first paragraph is present;
- middle section is present;
- final paragraph is present;
- no section was silently removed;
- paragraphs are not collapsed into one unreadable block;
- poems/verses/lists did not lose intentional lineation;
- quotes and source labels remain;
- diacritics remain intact;
- Chinese characters remain intact;
- links remain intact if present;
- no QA-report headings remain unless intentionally retained;
- no SegID/PARA markers remain unless intentionally retained.

WHEN TO USE THE REFLOWED OUTPUT
Use the .reflowed output when:

- PARA markers were used correctly;
- you want continuous prose with original paragraphing restored;
- the output will be pasted into a normal article, book, or blog post.

Do not use the reflowed output without checking when:

- the text contains verse, gāthās, poetry, transcript lineation, tables, code, or prompt bodies;
- line breaks carry meaning;
- the source uses intentionally short standalone lines;
- the cleanup result visually changes structure.

WHEN TO RETURN TO PROMPT 6 / PROMPT 9 INSTEAD
If the cleaned text reveals missing paragraphs, mistranslations, inconsistent terminology, broken quotes, bad paragraphing, or doctrinal errors, do not keep cleaning mechanically. Return to Prompt 6 for review or Prompt 9 for source-anchored refinement.

SIMPLIFIED / TRADITIONAL CHINESE CONVERSION NOTE
If using the separate Simplified-to-Traditional / Traditional-to-Simplified converter:

1. Treat it as character conversion only, not translation.
2. Check Buddhist technical terms after conversion.
3. Check proper nouns, names, Sanskrit/Pāli/Tibetan transliterations, and quoted classical passages.
4. Do not assume automated conversion preserves regional terminology preferences.
5. Keep a copy of the pre-conversion file.

FINAL STATUS
A cleanup step is complete only after the cleaned file is opened and checked. Do not claim the final text is clean merely because the utility produced an output file.
