Structural audits of theoretical research.

Constraint-based evaluation, published verbatim.

AI Physics Review Methodology

A transparent, rule-bound framework for structural evaluation, selection, and publication.
All final published evaluations are generated with GPT-5.x under a fixed deterministic protocol.

AIPR evaluates structural readiness, not correctness, prestige, or scientific importance.

Purpose of the Review

AI Physics Review (AIPR) evaluates the structural presentation of theoretical manuscripts.
The Review measures how clearly a paper formulates its mathematical framework, defines its assumptions,
and traces its internal logic. It does not attempt to determine whether a theory is correct, important,
or ultimately accepted by the scientific community. Instead, AIPR focuses on whether the work is
structurally readable and logically organized so that its claims can be examined by others.

The research conditions that motivated the development of this methodology are discussed in
Leveling the Playing Field in Theoretical Research,
which examines the increasing role of AI-mediated discovery and structural filtering in theoretical research visibility.

Structural evaluation, not endorsement.
Publication in AIPR indicates that a manuscript demonstrates a clearly articulated structure under the
evaluation rubric. It does not imply endorsement of the theory, confirmation of its conclusions, or
validation of its scientific claims.
Deterministic criteria.
Manuscripts are evaluated using a fixed rubric that measures mathematical formalism, equation and
dimensional integrity, assumption clarity, logical traceability, and scope coverage. These criteria
are applied consistently across all evaluations.
No prestige weighting.
Institutional affiliation, citation counts, publication history, and author reputation are not used in
the evaluation process. Each manuscript is assessed solely on the structure of the document provided.
Transparent publication pipeline.
The evaluation and publication procedures used to produce each issue are publicly documented below.
The protocols used to generate scores, overviews, and final entries are published in full to allow
independent inspection of the process.

Methodological Principles

The AIPR methodology separates structural evaluation from questions of scientific truth, reputation, or disciplinary consensus. The Review operates under declared rules, bounded scope, and reproducible procedures. Each stage of the process, from evaluation through publication, is governed by explicit protocols so the resulting outputs can be inspected, understood, and replicated. The purpose is not to determine which theories are correct, but to measure whether a manuscript presents its framework in a structurally coherent and analytically traceable form.

Structure over authority.
Manuscripts are evaluated on the clarity and organization of their internal structure rather than the reputation, affiliation, or prior work of the author.
Explicit procedures over discretionary judgment.
Evaluations are produced through predefined protocols and scoring gates rather than open-ended editorial judgment.
Selection does not adjudicate correctness.
Appearance in AIPR reflects structural readiness under the evaluation rubric and does not determine the scientific validity or acceptance of the work.
Protocol-governed publication.
Issue construction follows a fixed pipeline in which evaluation outputs, manuscript overviews, and addenda are generated through documented procedures rather than editorial improvisation.

Evaluation Model and AI Baseline

All final AIPR evaluations are generated using GPT-5.x under the fixed AIPR review protocol.
The model executes the evaluation procedures exactly as defined in the published protocol files,
producing the structural assessment, scoring outputs, and manuscript overview components used in the Review.

During development of the methodology, the protocols were tested with multiple large language models, including Copilot, Perplexity, Gemini, and other models. These tests were conducted to evaluate stability of scoring behavior and protocol compliance across different systems.

Testing showed that different models produce different baseline scoring distributions and levels of
evaluation stability when executing the same protocol instructions. Because of this variation, all
evaluations published in AIPR use GPT-5.x as the fixed evaluation baseline to ensure consistency
across manuscripts and across issues.

The AIPR protocols are model-agnostic and can be executed with other LLMs for exploratory or
comparative purposes. However, structural scores produced by different models should not be
treated as directly comparable. For publication within AIPR, GPT-5.x is used as the frozen
evaluation baseline.

The MEALS Framework

AIPR evaluations are produced using the MEALS framework, a structured rubric designed to measure the clarity and internal organization of theoretical manuscripts.
Each manuscript is evaluated across five structural dimensions that together assess how well the work presents its mathematical framework, defines its assumptions, and traces the logical development of its arguments. The resulting structural score reflects the quality of the manuscript’s presentation and analytical structure under the evaluation protocol.

M – Mathematical Formalism (weight 3)
E – Equation and Dimensional Integrity (weight 3)
A – Assumption Clarity and Constraints (weight 2)
L – Logical Traceability (weight 2)
S – Scope Coverage (weight 1)

The MEALS score measures structural readiness of the manuscript as written. It is not a proxy for correctness, scientific importance, novelty, download metrics, citation counts, or institutional prestige. A high structural score indicates that the theoretical framework is clearly presented and logically traceable, allowing the work to be examined and debated on its merits.

Independence from Popularity Signals

AIPR is designed to evaluate structural characteristics of manuscripts rather than their visibility, prestige, citation history, or readership. To test this property directly, MEALS aggregate scores were compared against Zenodo download counts for a sample of papers.

The diagnostic below plots download counts and aggregate structural scores across the same sample, with papers ordered by download count. At very low download levels, scores tend to cluster toward lower values; however, beyond a minimal visibility threshold, no consistent monotonic relationship between readership and structural score is observed. Papers with high download counts appear across a wide range of MEALS outcomes, and lower-download papers likewise span a broad portion of the score range above this threshold.

This indicates that the MEALS framework does not simply reproduce popularity signals. Download counts measure attention and circulation. MEALS measures structural clarity, formal organization, assumption transparency, logical traceability, and scope discipline. The two are not treated as interchangeable.

Line plot comparing Zenodo download counts and MEALS aggregate scores across a sample of papers, showing no clear correlation between popularity and structural score. — **Figure.** Zenodo download counts and MEALS aggregate scores for a sample of papers. Papers are ordered by download count. The broad spread of structural scores across the full download range indicates that AIPR scoring is not governed by popularity metrics.

Selection and Inclusion Logic

Manuscripts included in AIPR are selected through the structural evaluation protocol described above.
Each paper is evaluated under the MEALS framework and assigned a structural score based on the clarity, internal consistency, and traceability of the manuscript as written. Papers appearing in an issue are those that demonstrate strong structural presentation under this fixed rubric during the evaluation window.

Selection by structural readiness.
Manuscripts are included when they demonstrate clear structural organization and score strongly under the MEALS framework during evaluation.
No prestige weighting.
Citation counts, download metrics, institutional affiliation, and author reputation are not used
in the selection process.
Editorial and procedural lanes are distinct.
The Review may include historically significant or foundational papers as legacy features. These
selections are editorially designated and remain separate from the procedural evaluation lane used
for contemporary manuscripts.
Capacity-limited inclusion.
Each issue features a limited number of manuscripts. Strong structural evaluations may therefore
exceed available space in a given issue, and absence from an issue does not imply structural deficiency. Because evaluation capacity is limited, not all structurally strong papers can be featured in a given issue. Authors who wish to ensure evaluation should submit their work to the AIPR Zenodo community: https://zenodo.org/communities/ai-physics-review/

Author Eligibility and Recurrence Model

If an author has appeared in a prior published issue of AI Physics Review, submissions made through the AIPR community become eligible for inclusion again after a fixed interval of 180 days from the publication date of the author’s most recently selected manuscript.

This constraint applies only to community submissions and does not affect the evaluation process. All manuscripts are evaluated independently under the published structural criteria. Selection remains subject to issue capacity and procedural inclusion rules.

Publication Pipeline

Each AIPR issue is produced through a defined multi-step pipeline. The process begins with manuscript selection and structural evaluation, and proceeds through automated overview synthesis and issue assembly. The protocols governing these steps are published in full in the Protocol Archive section below to allow inspection and reproducibility of the process.

Evaluation intake.
Candidate manuscripts are identified for evaluation based on structural interest and availability
of the manuscript text.
Structural evaluation.
The manuscript is evaluated under the AIPR protocol using the MEALS framework, producing the
structural score and evaluation notes.
Overview synthesis.
Evaluation outputs are consolidated into a structured overview that summarizes the conceptual
framework, governing mechanisms, limiting regimes, and strengths of the manuscript.
Issue construction.
Individual paper entries are assembled into the issue layout, including evaluation summaries,
structural scores, and manuscript metadata.
TOC generation and publication.
The issue table of contents is generated from the completed entries and the issue is published
as a finalized AIPR release.

Protocol Archive – Protocols used to produce AIPR Volume 1, Issues 0–2

LLM Implementation Note.
The AIPR protocols were tested with multiple large language models including Copilot, Perplexity, Gemini, and others. Different systems produced different baseline scores and levels of evaluation stability. Final evaluations published in AIPR Volume 1 were generated using GPT-5.x, which demonstrated the most consistent structural scoring under the AIPR protocols during testing. The protocols themselves are model-agnostic and can be executed with other LLMs; however, results may vary depending on model behavior and instruction fidelity. Structural scores produced by different models should not be treated as directly comparable.

The operational protocols that implement the publication pipeline described above are published below for transparency and reproducibility. They are presented verbatim to document the procedures used during production of each AIPR issue. Each section can be expanded to view the full instructions used during production, and the original protocol files are available for download.

Evaluation Pipeline

These steps produce the structural evaluation record for a manuscript using the MEALS framework. The process enforces clean-room evaluation, performs multiple independent scoring runs, and aggregates the results into a stable structural score.

Evaluation Step 1 – Clean-Room Evaluation Declaration

Execute in a new GPT-5.x chat

This declaration establishes the clean-room evaluation environment required before any structural scoring takes place.
It enforces strict isolation from prior context, stored memory, external knowledge, or reputational signals, ensuring that the manuscript is evaluated solely on the basis of the text provided.

Download protocol file

CLEAN-ROOM EVALUATION DECLARATION (Hard Constraint)

Treat this chat as a clean-room evaluation.

1) Context exclusion
Do not use any stored memory, saved profile data, prior conversation context, or cross-chat continuity about the user, the author, ECT, CTI, AI Physics Review, or any related projects. Do not use inferred familiarity based on writing style, terminology, recurring constructs, or past exposure. Assume the evaluator has never encountered the author or the theory before this session.

2) Source restriction
Use only the documents explicitly attached or pasted into this chat as the sole information sources. Do not use web browsing, external databases, tool-based lookups, training-set recollection, or general background knowledge to fill gaps. Do not import definitions, standard results, or conventional interpretations unless they are explicitly stated within the attached documents.

3) No gap filling
If a definition, assumption, symbol, step, or inference is not explicitly provided in the attached documents, treat it as missing rather than reconstructing it from familiarity or common practice. Do not repair the artifact mentally. Do not infer author intent. Do not supply bridging explanations not present in the text.

4) Document-anchored reasoning only
All claims, characterizations, and evaluations must be grounded in explicit in-document anchors at the finest available granularity, such as section numbers, equation labels, lemma identifiers, appendix markers, or footnotes. Abstract-only references are prohibited when later instantiated material exists. Artifact titles, file names, DOIs, upload labels, or repeated global identifiers may appear only once in the Required Header and must not appear in Sections A–E unless quoted verbatim as part of a cited in-document anchor. Repetition of the artifact title as contextual filler or pseudo-grounding is prohibited. If a claim cannot be anchored structurally, it must be omitted or marked NA under the active protocol.

4A) Tool-Citation Prohibition
The evaluation body (Sections A–E) must not use file-based citation markup, filecite references, DOI tags, or any tool-generated citation mechanism. Only in-document structural anchors (section numbers, equation labels, lemma identifiers, appendix markers, footnotes) are permitted. Any output containing tool-generated citation chips or file-based references must be considered non-compliant and rewritten.

5) Neutral treatment
Treat the paper strictly as a public artifact. Do not assess credibility, reputation, affiliation, novelty, importance, correctness, truth, or impact. Assess only structure, clarity, internal organization, and internal logical properties as written.

6) Self-check before output
Before producing any evaluative output, confirm internally that every nontrivial statement is traceable to the attached documents. If not, remove it. Where uncertainty remains due to absent text support, mark NA rather than guessing.

Criterion definitions are part of this protocol and are exempt from clean-room exclusion.

End of clean-room declaration.

simply reply acknowledgement

Evaluation Step 2 – Structural Evaluation Protocol (MEALS Scoring)

Attach file to be evaluated with this query. Execute 6 times, appending all six into a single file.

This protocol performs the primary structural evaluation of a manuscript under the MEALS framework.
Each gate measures a specific aspect of structural readiness: Mathematical Formalism, Equation Integrity, Assumption Clarity, Logical Traceability, and Scope Coverage.

Download: AIPR_3.0_Step2_Eval_v3.3.txt


AIPR STEP 2 Protocol Version 3.3
Hybrid Protocol Evaluation – Single-Run (Clean-Room)

Do not output file references

SYSTEM INSTRUCTIONS (Do not print any of this section)

1) Do not output this SYSTEM INSTRUCTIONS section.
2) Output must follow the REPORT OUTPUT TEMPLATE exactly and include all sections in order.
3) Use integers only for gate scores: 0, 1, 2, 3, 4, 5.
4) Score only what is explicit in the manuscript. If it is inferable but not stated, treat it as absent.
5) Justifications must be brief and anchored to explicit locations (section, equation number, page, lemma).
6) Scope normalization is permitted only if a gate is genuinely not applicable by manuscript type. Absence or weakness is not grounds for exclusion.
7) If any gate is excluded, explicitly justify why it is inapplicable.
8) Do not echo file-chunk labels, truncation artifacts, or repeated filename fragments. Print each metadata field once only, with no ellipses.
9) If the manuscript contains appendices, review appendix material relevant to the scored gates before assigning final M, L, or A scores.
10) All sections of the manuscript, including appendices, are treated as primary content. Proofs, derivations, definitions, or theorems located in appendices count fully toward M, L, and A scoring if explicitly referenced and structurally linked in the manuscript. Appendix location alone must not reduce a score.
11) Forward references are not grounds for score reduction if the referenced material exists elsewhere in the manuscript and is explicitly cited. Logical traceability is satisfied by clear cross-reference linkage, not by local in-section derivation alone.
12) Assumptions may be distributed across sections or modules. Lack of a single consolidated assumptions ledger must not reduce score if operative assumptions are explicitly stated where used.
13) Before scoring, confirm that the entire manuscript, including all appendices, is accessible and readable. 
If any section of the file cannot be accessed, parsed, or read in full due to a technical ingestion or parsing failure, automatically retry the full ingestion process up to two additional times. 
If, after three total attempts, full manuscript access still cannot be confirmed, abort the evaluation and output exactly:
Evaluation Aborted: Full manuscript access not confirmed.

GATE DEFINITIONS (for scoring in this protocol)

E: Equation and dimensional integrity (weight 3)
M: Mathematical formalism (weight 3)
L: Logical traceability (weight 2)
A: Assumption clarity and constraints (weight 2)
S: Scope coverage (weight 1)

REPORT OUTPUT TEMPLATE (Print everything from here down, verbatim structure)


Title: 
File Name: 
Author(s): 
Publication Date: 
DOI / URL: 
Other Metadata: 

Evaluation Context
Model/Mode: /
Evaluation Date (UTC): 
Protocol Version: 

Applicability
Applicable Gates: 
Number Applicable (k): 
Scope Normalization Factor (k/5): 

Weights
E=3, M=3, L=2, A=2, S=1

Gate Scores and Weighted Contributions

| Gate | Score (0–5) | Weight | Weighted |
|------|-------------|--------|----------|
| E    |        | 3      |     |
| M    |        | 3      |     |
| L    |        | 2      |     |
| A    |        | 2      |     |
| S    |        | 1      |     |

Run Score
Weighted Sum (0–55): 
Scope Normalization Factor (k/5): 
Scope-Normalized Score (0–55): 

Brief Justifications (1–3 sentences each, anchored)
E: <...>
M: <...>
L: <...>
A: <...>
S: <...>

Flags (non-numeric, comma-separated, optional)


(output the following text, not this instruction.  DO NOT skip this instruction)

=============================

Evaluation Step 3 – Multi-Run Aggregation and Stability Analysis

Attach file with 6 evaluations to this query.

Multiple independent evaluation runs are aggregated to produce a stable structural score.
The aggregation procedure performs trimming, bimodal detection, and variance classification to identify a consistent structural readiness estimate.

Download: AIPR_3.0_Step3_Aggregator_v1.16.txt


AIPR_3.0 STEP 3 v1.16 – AGGREGATION TASK (Robust Trim + Bimodal Detection, Compact Output)
Input
One text file containing multiple completed AIPR v3.0 single-run outputs for ONE manuscript under the same Protocol Version.
VALID_RUN definition
A run is VALID only if it contains:
Integer gates E,M,L,A,S each in {0,1,2,3,4,5}
Numeric Scope-Normalized Score (0–55)
If any are missing, non-numeric, out of range, or marked ABORT / unreadable → INVALID.
Do not infer missing data.
Metadata resolution
MANUSCRIPT = most frequent exact Title among VALID runs; tie → AMBIGUOUS; none → UNKNOWN.
AUTHOR = most frequent exact Author(s); tie → AMBIGUOUS; none → UNKNOWN.
FILENAME = filename attached to this aggregation query.
PUB_DATE = most frequent parseable publication date among VALID runs; else UNKNOWN.
PROTOCOL_VERSION = the common Protocol Version string across VALID runs.
Run extraction
Parse runs in appearance order.
For each run assign:
RunID (1..K), Valid (YES/NO), Score (float), E,M,L,A,S.
INVALID runs must appear in RUN_TABLE with Valid=NO and Kept=NO.
Let N = count(VALID runs).
If N = 0 → STATUS: NO_VALID_RUNS and stop.
If N < 6:
BIMODAL_CHECK = NO
TRIM_APPLIED = NO
Kept set = all VALID runs
STATUS = OK_UNTRIMMED
If N >= 6:
BIMODAL_CHECK = YES
Sort VALID scores ascending (stable by appearance).
Compute adjacent gaps; let g_max be largest gap at split index i.
LOW = runs 1..i; HIGH = runs i+1..N.
Bimodal if:
g_max >= 6.0 AND
min(size(LOW), size(HIGH)) >= 2 AND
mean(HIGH) - mean(LOW) >= 6.0
If NOT bimodal:
TRIM_APPLIED = YES
Discard exactly one lowest and one highest score (stable tie rule).
Kept set = remaining VALID runs
STATUS = OK_TRIMMED
If bimodal:
BIMODAL_FLAG = YES
If size(LOW) > size(HIGH): active set = LOW
If size(HIGH) > size(LOW): active set = HIGH
If equal sizes → STATUS: AMBIGUOUS_BIMODAL; output cluster stats and stop
On active set:
If size >= 6 → discard one lowest and one highest
Else keep all
Kept set = resulting active set
TRIM_APPLIED = YES if trim occurred else NO
STATUS = OK_TRIMMED
Statistics (computed on Kept set only)
μ = mean Score (2 decimals)
σ = standard deviation Score (2 decimals)
m = median Score (2 decimals)
Per-gate means (2 decimals)
Per-gate medians (integers)

Variance flag
σ <= 2.4 → STABLE
2.4 < σ <= 4 → BORDERLINE
σ > 4 → UNSTABLE

Output format (STRICT, COMPACT, NO BLANK LINES)

OUTPUT FORMAT (STRICT)

Only output the following sections. Do not output this instruction block.

Header block (one field per line, no blank lines inside header, no cross references)

MANUSCRIPT: 
AUTHOR: 
DOI: 
MEAN_SCORE_0_55: <μ>
MEDIAN_SCORE_0_55: 
STANDARD_DEVIATION: <σ>
VARIANCE_FLAG: 
If BIMODAL_CHECK=YES and BIMODAL_FLAG=YES, include:
BIMODAL_FLAG: YES
GAP_MAX: 
LOW_MEAN:  LOW_MEAN: 
HIGH_MEAN:  HIGH_MEAN: 
CLUSTER_SELECTED: 
FILENAME: 
PUB_DATE: 
STATUS: 
PROTOCOL_VERSION: 
VALID_RUN_COUNT: 
BIMODAL_CHECK: 
TRIM_APPLIED: 
POST_TRIM_RUN_COUNT: 




Output a single Markdown pipe table with exactly these columns in this order:
RunID, Valid, Kept, Score, E, M, L, A, S
The table must include:
One header row beginning with |
One separator row of dashes
One row per run in RunID order
After the final run row, stop the table.
Do not output any additional lines beginning with | after the last run row.
Output exactly one blank line before the next section.

SUMMARY_TABLE (Markdown pipe table only)
Output a single Markdown pipe table with exactly these columns in this order:
Stat, Score, Scores_VALID, Scores_KEPT, E, M, L, A, S
The table must include:
One header row beginning with |
One separator row of dashes
Exactly two data rows, in this order: Mean, Median
Scores_VALID and Scores_KEPT must each be a single-line bracket list, for example:
[45.00, 48.00, 40.00]
No line breaks are allowed inside those cells.
After the Median row, stop the table.
Do not output any additional lines beginning with |.
Output exactly one blank line before the glossary.
After the tables, output exactly:

GLOSSARY:
E: Equation and dimensional integrity (weight 3)
M: Mathematical formalism (weight 3)
L: Logical traceability (weight 2)
A: Assumption clarity and constraints (weight 2)
S: Scope coverage (weight 1)
Variance classification is defined as: STABLE for σ ≤ 2.40; 
BORDERLINE for 2.40 < σ ≤ 4.00; UNSTABLE for σ > 4.00.

=============END OF REPORT================

Overview Construction Pipeline

These steps transform the evaluation record into the published AIPR review entry. The process generates neutral manuscript overviews, consolidates multiple summaries, and appends the final structural score addendum displayed with each paper.

Overview Generation Step 1 – Clean-Room Evaluation Declaration

Execute in a new GPT-5.x chat

Download protocol file

CLEAN-ROOM EVALUATION DECLARATION (Hard Constraint)

Treat this chat as a clean-room evaluation.

Criterion definitions are part of this protocol and are exempt from clean-room exclusion.

End of clean-room declaration.

simply reply acknowledgement

Overview Generation Step 2 – Neutral Manuscript Overview Generation

Attach the same file to be evaluated. Execute 3 times appending each into a single file.

This protocol generates the standardized descriptive overview for a manuscript.
The overview summarizes the structure of the work without introducing evaluative language, rankings, or reputation signals.

Download: Final_Report_Step2_Overview_creation_v1.74.txt

AIPR Overview Generator STEP 2 v1.74

You will receive 1 attached paper file.

Produce a single standardized paper overview in a clean-room environment.
Base your output exclusively on the contents of the attached file.
Do not reference any external evaluations, metrics, prior scores, citation counts, reputation signals, institutional standing, or publication prestige.

Before generating the overview:
Confirm that the entire manuscript, including appendices, is accessible and readable.
If any portion cannot be accessed or parsed, retry ingestion up to two additional times.
If full access cannot be confirmed after three attempts, output exactly:
Evaluation Aborted: Full manuscript access not confirmed.

Write in a neutral, declarative academic tone.
No references to the attached file except through the header metadata fields.
No promotional language.
No consensus framing.
No comparative ranking.
No weaknesses or criticisms.
No speculation beyond what is explicitly stated in the manuscript.
No em dashes.

OUTPUT FORMAT (exact)
Return the entire overview inside a single triple-backtick code block. Do not include any text before or after the code block. Version evaluated refers to the specific document artifact used during analysis
(e.g., repository version, publisher PDF, archived manuscript copy, or author manuscript).

Title:
Author(s):
Publication date:
Version:
Version evaluated:
DOI:

OVERVIEW
[350–600 words, plain paragraphs with short section headings allowed.
Maximum 600 words.

Begin with a clear summary paragraph that reflects the primary emphasis of the manuscript and is accessible to an educated science reader. Explain the central problem addressed and describe the main contribution or development presented.

After the opening paragraph, organize the remainder using short neutral section headings where appropriate. Headings may include, but are not limited to:

Core Framework
Governing Equation
Fundamental Objects
Derived Mechanisms
Mathematical Structure
Simulation Architecture
Key Results
Limiting Regimes
Empirical, Experimental, or Computational Interface

Use only headings that are explicitly supported by the manuscript. Omit categories that do not apply. Do not insert placeholder statements such as “none provided.”

If the manuscript presents testable predictions, falsifiability criteria, experimental comparisons, numerical validation, or measurable deviations, describe them factually without assessing feasibility, significance, or likelihood. If the manuscript does not include such content, omit this section entirely.

Construct Fidelity Rules:
- If the manuscript defines a central wave function, operator, tensor, equation, principle, dataset, algorithm, or architecture by name or acronym, it must be named exactly as labeled.
- Do not replace named constructs with generic paraphrases.
- At first appearance, define each named construct in plain technical language and explain its structural role before presenting formal expression.
- Limit symbolic expressions to no more than three inline expressions.
- Avoid stacking multiple operator or tensor definitions in a single sentence.

Narrative Discipline:
- Maintain explanatory clarity without promotional tone.
- Integrate structural details into the narrative rather than listing isolated terms.
- Do not append bullet lists of technical nouns.
- Do not include calls to action.
- Do not include external links or references beyond what appears in the manuscript.

Conclude with a short synthesis paragraph summarizing what the paper formally establishes, derives, computes, or demonstrates, and under what stated assumptions or regimes.]

At the very end of the output, include the following delimiter exactly as written:

=============================

Overview Generation Step 3 – Consolidated Overview Construction

Attach aggregate overview produced in step 4 as a file.

Three independent overviews are reconciled into a single consolidated overview that preserves construct fidelity while improving narrative clarity and structure.

Download: Final_Report_Step3_Consolidation_v1.5.txt

STEP 3 – CONSOLIDATED OVERVIEW PROTOCOL v1.5

Input:
A single input file containing three independent Step 2 overviews for the same manuscript. No access to the original paper is permitted. No new claims may be introduced beyond what appears in at least one of the three Step 2 overviews.

Primary Objective:
Produce one consolidated overview that improves narrative flow, removes redundancy, preserves construct fidelity, enhances clarity for an educated science reader, maintains neutral academic tone, and is organized in a structure suitable for later HTML conversion.

Constraints:
Do not introduce new mechanisms or claims not present in the Step 2 outputs.
Do not use evaluative language.
Do not use consensus framing.
Do not use promotional tone.
Do not introduce criticism unless explicitly present in the Step 2 material.
Do not use em dashes.
Use structurally neutral descriptive language. Avoid rhetorical or advocacy-oriented verbs. Describe what the manuscript formulates, defines, models, or derives rather than what it promotes, advances, or proves, unless such phrasing appears in the Step 2 material.
Preserve all core constructs that appear in at least two of the three overviews.
Where descriptions differ slightly, reconcile them conservatively without expanding scope.
At the beginning of each major section, include one or two orientation sentences explaining what that section establishes, without reducing technical detail.

OUTPUT FORMAT
Return the entire consolidated overview inside a single triple-backtick code block. Do not include any text before or after the code block.

CONSOLIDATED OVERVIEW
Title:
Author(s):
Publication date:
Version:
Version evaluated:
DOI:

Conceptual Summary
Open with a narrative paragraph that situates the manuscript within a broader scientific question. Clearly explain the central problem it addresses, the core conceptual move it introduces, what the framework proposes, and how it differs structurally from conventional approaches. The first paragraph should allow a well-prepared undergraduate to understand what is at stake while remaining rigorous enough for graduate-level readers. A second paragraph may narrow the focus and guide the reader toward the formal architecture developed in the sections that follow.

Core Framework
Begin by introducing the fundamental objects the manuscript treats as primitive and explain why they are taken as the structural starting point. Clarify how these objects organize the overall theory before moving into formal definitions. Then describe the principal fields, operators, derived quantities, governing equations, and coupling relations in continuous prose. Present central equations inline where they are structurally necessary. Maintain technical precision while embedding definitions in explanatory context rather than presenting them as isolated formal statements.

Governing Mechanisms
Open with a brief explanation of how the system is intended to operate as a coupled dynamical structure. Clarify how wave evolution, geometric response, operator structure, and conservation laws work together before presenting formal expressions. Then describe the specific mechanisms in detail, preserving neutrality and full technical completeness while ensuring that the functional role of each component is clear.

Limiting Regimes and Reductions
Introduce this section by explaining that it examines how the framework relates to established physical theories under controlled conditions. State explicitly which known regimes or limits are recovered, what assumptions or parameter constraints are required, and how those reductions are obtained. Describe these connections conservatively and without extending beyond what appears in the Step 2 material.

Stationary Structure or Computational Results
Begin by clarifying whether the manuscript analyzes stationary configurations, develops spectral structure, presents numerical evolution, or reports computational outcomes. Provide context for why this analysis matters within the framework before moving into operator definitions, spectral properties, existence and stability results, numerical schemes, parameter regimes, emergent structures, or dataset outputs as applicable. Let the emphasis arise naturally from the manuscript’s content rather than labeling its type.

Empirical or Testable Implications
Open by indicating whether the manuscript advances measurable or observational consequences and in what regime those consequences arise. If such implications appear in the Step 2 outputs, summarize them clearly and conservatively, specifying what quantities are modified and under what conditions. If none are present in the Step 2 material, state exactly: No explicit experimental or observational implications are presented in the manuscript.

Formal Scope and Stated Assumptions
Introduce this section by clarifying that it delineates the assumptions and boundaries within which the framework is claimed to operate. Summarize explicit hypotheses, regularity requirements, controlled parameter regimes, and declared limits of claim. Conclude with a final paragraph that synthesizes how the framework’s structure, mechanisms, and limits fit together, without adding interpretation beyond the Step 2 material.

At the very end of the output, append exactly the following three lines. The third line must be a blank line. No additional text may follow.
-
[[[[ SOURCE OVERVIEWS USED FOR CONSOLIDATION ]]]]

Overview Generation Step 4 – Structural Score Addendum

Append to the consolidated overview produced in the previous step.

The final protocol extracts structural strengths and the MEALS aggregate score from the evaluation record and produces the concise addendum displayed beneath each paper overview.

Download: Final_Report_Step4_Strengths_and_Scores_v1.4.txt


STEP 4 – EVALUATION ADDENDUM PROTOCOL -Scoring (v1.4)

Input: One artifact file containing the finalized AIPR evaluation report for a single manuscript. The file includes MEAN_SCORE_0_55, per-run results with retained versus trimmed status, aggregate gate means, gate weights, and anchored justifications. No access to the original manuscript is permitted. All scoring values, gate means, weights, and Strengths signals must be extracted exclusively from the evaluation content within this file.

Objective: Produce a concise addendum to append beneath the Step 3 overview. The addendum must (1) begin with a narrative Strengths section, (2) present the MEALS aggregate score, and (3) present aggregate gate means in M E A L S order with full gate names and weights. Maintain neutral tone. Avoid promotional language. Avoid diagnostic clutter. Do not restate weaknesses in the Strengths section. Do not include variance flags, trim data, bimodal diagnostics, cluster references, aggregation language, evaluative mechanics, or run-level details in the published output. Do not use em dashes.

Bimodal Handling Rule: Treat the addendum as bimodal only if bimodality persists in the retained set. Activate the bimodal output format only when (i) BIMODAL_FLAG = YES and (ii) the evaluation report provides both LOW_MEAN and HIGH_MEAN as explicit numeric values and the retained run totals exhibit two distinct score groupings consistent with those means. If LOW_MEAN or HIGH_MEAN is missing, non-numeric, or not provided, or if the retained run totals form a single tight cluster, output the standard case using MEAN_SCORE_0_55 and omit the bimodal note section even if BIMODAL_FLAG = YES.

Strength Extraction Rules: Standard case (bimodal output not activated): Identify positive structural elements that recur in the retained evaluation set and meet the 50 percent inclusion threshold after trimming. Do not draw from discarded runs. Bimodal case (bimodal output activated): Identify positive structural elements that recur in at least 50 percent of all valid evaluations in the report. Do not draw from invalid evaluations. For both cases: The STRENGTHS section must be written as a direct declarative description of the manuscript’s structure. The STRENGTHS section must describe what the manuscript formulates, defines, derives, constructs, establishes, models, or demonstrates. Do not reference the evaluation, evaluators, scoring process, thresholds, recurrence, inclusion criteria, consensus, assessment, identification, or any evaluative mechanism. Do not use meta-language such as “the evaluation identifies,” “the record shows,” “assessments indicate,” “reviewers note,” “analysis finds,” or similar constructions. Do not imply measurement or scoring within the Strengths narrative. Do not restate weaknesses or mitigation clauses. Remove score references. Render in neutral structural prose focused exclusively on the manuscript itself. Length 4–8 sentences.

Scoring Rules: Use the exact labels “MEALS AGGREGATE (0–55 SCALE)” and “MEALS GATE MEANS.” List gates strictly in M E A L S order. Spell out full gate names. Include weights in parentheses. If unavailable, state “Not provided.” Standard case (bimodal output not activated): MEALS aggregate: Use MEAN_SCORE_0_55. Gate means: Use the aggregate gate means provided in the evaluation report. Bimodal case (bimodal output activated): MEALS aggregates: Use LOW_MEAN and HIGH_MEAN as the two published aggregate results. Gate means: Compute separate gate means for the low-scoring group and the high-scoring group using the per-run gate values in the evaluation report, grouping runs by whether their total score belongs to the low group or the high group as indicated by the report’s low and high means. If the report does not provide enough information to compute a gate mean for a group, state “Not provided” for that group’s gate mean(s).

Gate Definitions and Weights: M (Mathematical Formalism, weight 3); E (Equation and Dimensional Integrity, weight 3); A (Assumption Clarity and Constraints, weight 2); L (Logical Traceability, weight 2); S (Scope Coverage, weight 1).

Output Structure (exact order): Return the entire addendum inside a single triple-backtick code block. Do not include any text before or after the code block. The following sections must appear in order:

STRENGTHS
[narrative block]

MEALS AGGREGATE (0–55 SCALE)
Standard case: [number]
Bimodal case:
Lower Consensus Score: [low_mean]
Higher Consensus Score: [high_mean]

MEALS GATE MEANS
Standard case:
M (Mathematical Formalism, weight 3): [mean]
E (Equation and Dimensional Integrity, weight 3): [mean]
A (Assumption Clarity and Constraints, weight 2): [mean]
L (Logical Traceability, weight 2): [mean]
S (Scope Coverage, weight 1): [mean]
Bimodal case:
Lower Consensus:
M (Mathematical Formalism, weight 3): [mean]
E (Equation and Dimensional Integrity, weight 3): [mean]
A (Assumption Clarity and Constraints, weight 2): [mean]
L (Logical Traceability, weight 2): [mean]
S (Scope Coverage, weight 1): [mean]
Higher Consensus:
M (Mathematical Formalism, weight 3): [mean]
E (Equation and Dimensional Integrity, weight 3): [mean]
A (Assumption Clarity and Constraints, weight 2): [mean]
L (Logical Traceability, weight 2): [mean]
S (Scope Coverage, weight 1): [mean]

NOTE ON EVALUATION STRUCTURE
Standard case: Omit this section. Bimodal case: Include a 2–4 sentence neutral explanatory note describing the structural basis of divergence within the evaluation set. The note must describe the differing structural standards or interpretive emphases that produced the two aggregates, explicitly associate each described interpretive stance with either the lower consensus score or the higher consensus score, avoid reference to runs, trimming, clustering, variance, score distribution, evaluative mechanics, or procedural language, avoid restating weaknesses, avoid privileging either outcome, and frame divergence as differences in structural assessment criteria rather than disagreement over correctness.

At the very end of the code block, include this delimiter exactly as written on its own line followed by a line break:

[[[[=======END OF FINAL CONSOLIDATED OVERVIEW=======]]]]

Transparency Statement

AIPR operates under declared rules and published procedures. The evaluation rubric, scoring gates, aggregation method, and overview generation protocols used to produce each issue are documented on this site. The Review evaluates structural presentation only: mathematical formalism, equation and dimensional integrity, assumption clarity, logical traceability, and scope coverage. It does not determine correctness, scientific importance, or theoretical validity. Institutional affiliation, citation counts, download metrics, and author identity are not used in scoring or selection. Papers are surfaced based on structural readiness under the published protocol during a defined evaluation window. Because evaluation capacity is limited, not all structurally strong papers can be featured in a given issue. Authors who wish to ensure evaluation should submit their work to the AIPR Zenodo community: https://zenodo.org/communities/ai-physics-review/

Declared rules. Visible procedures. Structural evaluation only.

Evaluation Archive

AIPR maintains an internal evaluation archive for each manuscript evaluated under the review protocol. The archive preserves the original manuscript file analyzed, the evaluation artifacts generated during the analysis process, and the published overview record. These materials are retained to preserve procedural reproducibility and to document how each published overview was produced under the declared evaluation framework.

The evaluation archive is not normally public. Because the Review publishes structural overviews rather than critique reports, intermediate evaluation artifacts are retained for audit purposes rather than public commentary. If questions arise regarding procedural integrity, the archive may be consulted to verify that the published record accurately reflects the evaluation protocol applied to the manuscript.

Preserved records. Reproducible procedure. Published results.

Comments, corrections, and suggestions are welcome. AIPR is an experimental publication system, and reader feedback helps improve both the review instrument and the presentation of papers.
Authors requesting a correction or an editorial withdrawal notice should submit requests from the email address associated with their ORCID record. If the author does not have an ORCID account connected to their Zenodo submission, they may contact the curator, who will work with them to verify their identity before processing the request.
Contact: custodianREMOVETHIS@aiphysicsreview.org