Working with AI Chats

Scope note: This guide reflects observed behavior and published limits for ChatGPT and major frontier models as of early 2026. Treat numeric limits as indicative; vendors change them without notice.

Core Principles

LLMs are predictors, not logic engines. They generate plausible continuations, not guaranteed truths.
- For additional information about how this works, see How LLMs Actually Generate Text (Every Dev Should Know This).
Think of them as brilliant, distractible assistants. Fast and knowledgeable, but prone to drift, agreement bias, and subtle hallucination.

Tokens & Context

Context windows continue to expand, but practical reliability still degrades near the ceiling.
- GPT‑5.2 (ChatGPT): ~256k tokens standard context; Thinking variants support larger effective windows (often ~196k+), tier-dependent.
- GPT‑4.1 family: up to ~1M tokens in some enterprise/API contexts.
- Claude 4 (Opus/Sonnet): ~200k–1M tokens depending on plan and variant.
- Gemini 2.5 Flash / Pro: ~1M tokens input; output typically capped ~65k (Flash documented).
- Qwen 2.5 (72B Instruct): ~32k tokens (open model cards).
- Perplexity Sonar: ~128k tokens context reported.
Context drops still occur when limits are exceeded; older messages are truncated or summarized.
Best practice: never rely on “infinite” context. Chunk work, summarize checkpoints, and externalize state into documents.

ChatGPT Usage Limits (verified as of early 2026)

Tier	GPT‑5.2 Standard	GPT‑5.2 Thinking
Free	~10 messages per 5 hours, then fallback to a smaller model	~1 Thinking message/day
Plus	~160 messages per 3 hours (temporary expanded cap)	~3,000 manual Thinking messages/week
Pro / Business / Enterprise	Effectively unlimited (subject to abuse guardrails)	Effectively unlimited (fair‑use enforced)

Important distinctions

Manual Thinking requests count toward weekly limits.
Automatic internal escalation from standard mode does not consume manual Thinking quota.
UI limits are often lower than API limits.

Model Behavior

What’s Stable

Focused sessions outperform sprawling threads.
Explicit structure (steps, constraints, deliverables) improves consistency.
Repeating key facts or invariants reduces drift.

What’s Changed

Memory features are now broadly enabled:
- Preferences and style can persist across sessions.
- Memory can usually be reviewed, edited, or wiped in settings.
Safety tuning is stronger and more dynamic:
- More hedging.
- More refusals around edge cases.
- Behavior can change week to week without version bumps.

Confidence vs Accuracy

Models remain agreeable and can reinforce your mistakes.
Hedging has increased, but correctness has not improved proportionally.

Rule of thumb:

Trust for brainstorming, outlining, summarization, code scaffolding.
Verify for math, logic, legal, medical, financial, or historical facts.

Choosing the Right Model (2026 snapshot)

Model	Max Context (approx)	Output Cap	Memory	Tools	Notes
GPT‑3.5	~16k	~4k–8k	No	Limited	Fast, shallow; largely legacy
GPT‑4‑Turbo	~128k	~8k–16k	Yes	Yes	Stable baseline
GPT‑4.1 family	Up to ~1M (enterprise/API)	~32k	Yes	Yes	Availability varies by plan
GPT‑5.2	~256k standard	~32k (varies)	Yes	Yes	Current flagship
GPT‑4o / 4o‑mini	~128k	Lower in UI	Yes	Yes	Fast, multimodal
Claude 4	~200k–1M	~64k	Yes	Varies	Plan‑dependent
Gemini 2.5 Flash	~1M input	~65k	Yes	Yes	Speed‑optimized, API‑centric
Gemini 2.5 Pro	~1M (reported)	Unclear	Yes	Yes	Output caps vary
Qwen 2.5	~32k	Unclear	Varies	Varies	Open model cards
Perplexity Sonar	~128k	Unclear	Varies	Yes	Search‑centric

File & Data Handling

Uploading many files increases confusion and token pressure.
Best practice:
- Work with 2–3 files at a time.
- Use staged comparisons (A vs B, then fold in C).
Modality quality matters:
- OCR errors propagate.
- Transcripts drop nuance.

Math & Logic (Still Weak Spots)

Pattern math is strong; symbolic manipulation is fragile.
Multi‑step reasoning benefits from explicit scaffolding.

Better approach:

Ask for executable code.
Use external solvers or calculators for critical results.

Practical Tips

Specify output shape and depth explicitly.
Use headings, checkpoints, and acceptance criteria.
Externalize long‑running work to shared documents.
Reset threads when contradictions accumulate.

Memory & Personalization

Memory is active by default in many systems.
Review stored memory periodically.
Store preferences and durable context only.
Prune aggressively.

Considerations

Cost: long contexts are expensive with diminishing returns.
Privacy: many providers train on chats unless you opt out.
Safety: guardrails shift without notice.

Bottom Line

Keep chats focused and state explicit.
Know your model’s current limits.
Verify anything that matters.
Treat AI as an assistant—not a calculator, database, or oracle.