Scope note: This guide reflects observed behavior and published limits for ChatGPT and major frontier models as of early 2026. Treat numeric limits as indicative; vendors change them without notice.
Core Principles
- LLMs are predictors, not logic engines. They generate plausible continuations, not guaranteed truths.
- For additional information about how this works, see How LLMs Actually Generate Text (Every Dev Should Know This).
- Think of them as brilliant, distractible assistants. Fast and knowledgeable, but prone to drift, agreement bias, and subtle hallucination.
Tokens & Context
-
Context windows continue to expand, but practical reliability still degrades near the ceiling.
- GPT‑5.2 (ChatGPT): ~256k tokens standard context; Thinking variants support larger effective windows (often ~196k+), tier-dependent.
- GPT‑4.1 family: up to ~1M tokens in some enterprise/API contexts.
- Claude 4 (Opus/Sonnet): ~200k–1M tokens depending on plan and variant.
- Gemini 2.5 Flash / Pro: ~1M tokens input; output typically capped ~65k (Flash documented).
- Qwen 2.5 (72B Instruct): ~32k tokens (open model cards).
- Perplexity Sonar: ~128k tokens context reported.
-
Context drops still occur when limits are exceeded; older messages are truncated or summarized.
-
Best practice: never rely on “infinite” context. Chunk work, summarize checkpoints, and externalize state into documents.
ChatGPT Usage Limits (verified as of early 2026)
| Tier | GPT‑5.2 Standard | GPT‑5.2 Thinking |
|---|---|---|
| Free | ~10 messages per 5 hours, then fallback to a smaller model | ~1 Thinking message/day |
| Plus | ~160 messages per 3 hours (temporary expanded cap) | ~3,000 manual Thinking messages/week |
| Pro / Business / Enterprise | Effectively unlimited (subject to abuse guardrails) | Effectively unlimited (fair‑use enforced) |
Important distinctions
- Manual Thinking requests count toward weekly limits.
- Automatic internal escalation from standard mode does not consume manual Thinking quota.
- UI limits are often lower than API limits.
Model Behavior
What’s Stable
- Focused sessions outperform sprawling threads.
- Explicit structure (steps, constraints, deliverables) improves consistency.
- Repeating key facts or invariants reduces drift.
What’s Changed
-
Memory features are now broadly enabled:
- Preferences and style can persist across sessions.
- Memory can usually be reviewed, edited, or wiped in settings.
-
Safety tuning is stronger and more dynamic:
- More hedging.
- More refusals around edge cases.
- Behavior can change week to week without version bumps.
Confidence vs Accuracy
- Models remain agreeable and can reinforce your mistakes.
- Hedging has increased, but correctness has not improved proportionally.
Rule of thumb:
- Trust for brainstorming, outlining, summarization, code scaffolding.
- Verify for math, logic, legal, medical, financial, or historical facts.
Choosing the Right Model (2026 snapshot)
| Model | Max Context (approx) | Output Cap | Memory | Tools | Notes |
|---|---|---|---|---|---|
| GPT‑3.5 | ~16k | ~4k–8k | No | Limited | Fast, shallow; largely legacy |
| GPT‑4‑Turbo | ~128k | ~8k–16k | Yes | Yes | Stable baseline |
| GPT‑4.1 family | Up to ~1M (enterprise/API) | ~32k | Yes | Yes | Availability varies by plan |
| GPT‑5.2 | ~256k standard | ~32k (varies) | Yes | Yes | Current flagship |
| GPT‑4o / 4o‑mini | ~128k | Lower in UI | Yes | Yes | Fast, multimodal |
| Claude 4 | ~200k–1M | ~64k | Yes | Varies | Plan‑dependent |
| Gemini 2.5 Flash | ~1M input | ~65k | Yes | Yes | Speed‑optimized, API‑centric |
| Gemini 2.5 Pro | ~1M (reported) | Unclear | Yes | Yes | Output caps vary |
| Qwen 2.5 | ~32k | Unclear | Varies | Varies | Open model cards |
| Perplexity Sonar | ~128k | Unclear | Varies | Yes | Search‑centric |
File & Data Handling
-
Uploading many files increases confusion and token pressure.
-
Best practice:
- Work with 2–3 files at a time.
- Use staged comparisons (A vs B, then fold in C).
-
Modality quality matters:
- OCR errors propagate.
- Transcripts drop nuance.
Math & Logic (Still Weak Spots)
- Pattern math is strong; symbolic manipulation is fragile.
- Multi‑step reasoning benefits from explicit scaffolding.
Better approach:
- Ask for executable code.
- Use external solvers or calculators for critical results.
Practical Tips
- Specify output shape and depth explicitly.
- Use headings, checkpoints, and acceptance criteria.
- Externalize long‑running work to shared documents.
- Reset threads when contradictions accumulate.
Memory & Personalization
- Memory is active by default in many systems.
- Review stored memory periodically.
- Store preferences and durable context only.
- Prune aggressively.
Considerations
- Cost: long contexts are expensive with diminishing returns.
- Privacy: many providers train on chats unless you opt out.
- Safety: guardrails shift without notice.
Bottom Line
- Keep chats focused and state explicit.
- Know your model’s current limits.
- Verify anything that matters.
- Treat AI as an assistant—not a calculator, database, or oracle.