My Cheatsheets

Enticing Thinking for Code Projects


Scope note: This cheatsheet reflects observed behavior in ChatGPT models. It does not generalize to all LLMs.

Stability note: Model names, limits, and UI signals change. Treat specific numbers and badges as indicative, not contractual.

With the release of the GPT-5 model in August 2025, OpenAI updated usage limits—especially for deeper Thinking modes. The guidance below focuses on reliably triggering deeper internal reasoning without unnecessarily consuming manual Thinking quotas.

ChatGPT (web/mobile) – observed usage tiers

Tier GPT-5 Standard GPT-5 Thinking (deeper reasoning)
Free ~10 messages / 5 hours; then fallback to smaller model. ~1 Thinking message/day. ~1 Thinking message/day.
Plus ~160 messages / 3 hours (temporarily elevated at time of review). ~200 manual Thinking messages/week. Auto-escalation does not count.
Pro/Team Effectively unlimited standard usage (subject to abuse guardrails). Access to extended/pro Thinking variants.

Key distinction

If a standard GPT-5 request internally escalates, you benefit from deeper reasoning without spending a manual Thinking slot.

Heuristics that often correlate with deeper internal reasoning

These are correlations, not guarantees:

Do not assume you can reliably detect or force internal escalation. The goal is to increase likelihood, not control it.

General prompt structure

“You are acting as a senior [language/framework] engineer. Task: [clear outcome]. Context: [repo summary / constraints / runtime / env]. Inputs: [code snippets, error logs, benchmarks]. Requirements: [functional + non-functional]. Deliverables: [analysis, plan, patch, tests, risks, alternatives]. Evaluate invariants, edge cases, trade-offs, and failure modes before proposing code.”

Why it helps: linked constraints + evaluation criteria tend to trigger internal depth.


1) Bug triage & minimal repro

Template

“Given this failing behavior [symptoms/logs], rank likely root causes by probability. Produce a minimal reproducible example in [language/tooling]. For each cause, show a falsification experiment, then propose the smallest patch and a regression test.”

Signals that correlate with depth: ranking, falsification, MRE, patch + test pairing.


2) Spec → plan → interfaces (no code first)

Template

“Translate this feature request [spec/user story] into:

  1. explicit invariants and pre/post-conditions,
  2. module boundaries and public interfaces,
  3. a stepwise implementation plan with checkpoints and rollback. Highlight ambiguities and propose clarifying questions. Do not emit code until the analysis is complete.

3) Defensive test design

Template

“Design a test suite for [component] covering:

  • happy paths and boundary values,
  • property-based and adversarial inputs,
  • performance guards (time/memory thresholds),
  • concurrency or race conditions if applicable. Deliverables: test-matrix table, example inputs/expected outputs, and rationale per case.”

Optional add-on:

“Convert the matrix into [framework] test stubs.”


4) Performance analysis

Template

“Given [benchmarks/profiles], identify the true bottleneck. Compare at least three optimization strategies (algorithmic, data-structure, system-level). Include complexity analysis, expected absolute wins, and a guardrail benchmark with acceptance thresholds.”


5) Refactor with safety

Template

“Refactor [module/path] to improve [maintainability/cohesion/complexity]. Constraints: zero behavior change; public API stable. Deliverables: refactor map (before → after), risk list, and a safety net (snapshot, golden, or contract tests). Stage the work across N small PRs. Do not emit code until risks and safety nets are defined.


6) Concurrency & correctness

Template

“For [concurrent/async] code, enumerate interleavings that violate invariants. Provide a happens-before diagram. Identify deadlock, livelock, and starvation risks. Propose a synchronization strategy and justify it with contention analysis.”


7) API design review (backwards compatibility)

Template

“Evaluate this API [signature/examples] for ergonomics, consistency, discoverability, error surface, and evolution. Propose a deprecation path and versioning policy. Include adapters or shims for [old → new], with examples that make misuse difficult.”


8) Migration or rewrite plan

Template

“Plan migration from [X] to [Y]. Map data or schema transforms, compatibility layers, dual-read/dual-write strategy, and cutover criteria. Identify irreversible steps and a rollback plan. Provide a milestone timeline with measurable gates. Do not emit code until the plan is validated.


9) Security & threat modeling (lightweight)

Template

“Perform a lightweight threat model for [component] (STRIDE-lite). List assets, trust boundaries, and the top five concrete threats with exploit sketches. Recommend mitigations with cost/impact trade-offs and note residual risk.”


10) Code review with rationale

Template

“Review this diff [patch]. Classify findings: correctness, performance, readability, testability, security. For each, provide a one-sentence rationale and a minimal fix snippet. End with a risk summary and an approve / request-changes recommendation.”


Phrases that often nudge deeper reasoning


When not to trigger deeper reasoning

Avoid explicit depth for:

Over-triggering depth increases latency without improving outcomes.


When to spend manual Thinking

Use it deliberately when:


Example (drop-in)

“You are a senior Go engineer. Task: intermittent deadlock in the job scheduler. Context: Go 1.22, Linux, 16-core system; worker pool + buffered channels. Inputs: [stack traces / pprof]. Requirements: no functional regressions; handle 50k jobs/min; p95 latency < 150ms. Deliverables:

  1. ranked root causes with falsification experiments,
  2. minimal repro,
  3. proposed fix with happens-before explanation,
  4. test plan covering races and starvation,
  5. rollback plan.”