DeepSeek V4 Pro beats GPT-5.5 Pro on precision
Article URL: https://runtimewire.com/article/deepseek-v4-pro-beats-gpt-5-5-pro-on-precision Comments URL: https://news.ycombinator.com/item?id=48440448 Points: 194 # Comments: 65
DeepSeek V4 Pro takes this matchup 38.0 to 33.0, and the margin feels earned. Across the scored tasks, the pattern is simple: Model A was tighter, more literal, and more reliable under constraints, while Model B was good but a little too willing to improvise.
The clearest technical win came in python-log-redactor. DeepSeek handled overlapping patterns the right way: one regex, one replacer, correct priority, no dropped matches. GPT-5.5 Pro split the work across separate regexes, which opens the door to ordering bugs, and its email pattern had small but real flaws around boundaries and over-matching. That is the difference between code that merely looks plausible and code you would actually trust.
DeepSeek also won the instruction-following tasks by not getting cute. In vendor-delay-update, it did exactly what the prompt asked: tell the VP to send daily shortage counts by 4 p.m. local time, in a calm and accountable tone, without bolting on extra process. GPT-5.5 Pro wrote a solid note, but it drifted—adding shift-handoff and escalation details and even redirecting the recipient toward "Operations Planning." In meeting-notes-summary, the gap was even cleaner: DeepSeek matched the schema exactly, while GPT-5.5 Pro broke it with conditional text in launch_date and an array for blocked_by where a single value was required.
The only draw was messy-orders-to-json, where both models did the unglamorous work correctly: valid JSON, preserved order, correct schema, normalized fields. But a tie on the easy cleanup task does not erase the misses on precision work.
Final call: DeepSeek V4 Pro is the better model here. It was more disciplined, more exact, and more dependable on the tasks where small deviations turn into real failures.
How they were tested
We ran 4 fresh text tasks, generated on the fly for this matchup so neither model could prepare in advance, and had grok-4-1-fast-non-reasoning score each one. DeepSeek: DeepSeek V4 Pro scored 38.0 to OpenAI: GPT-5.5 Pro's 33.0.
1. python-log-redactor
Language: Python 3. Write code only. Implement a function
redact_log(line: str) -> strfor an internal support tool. It must mask: - email addresses -> replace the whole address with[EMAIL]- IPv4 addresses -> replace with[IP]- ticket IDs of the formINC-followed by 6 digits -> replace with[TICKET]Preserve all other text exactly. Do not mask invalid IPs like999.1.2.3. Assume no multiline input. Include any imports needed and nothing else besides the code.
Winner: DeepSeek: DeepSeek V4 Pro — Model A correctly handles overlapping patterns with a single regex and replacer function, ensuring proper replacement priority and no missed matches. Model B's separate regexes risk incorrect ordering and has minor email regex flaws like missing word boundaries and potential over-matching.
2. vendor-delay-update
Draft a workplace status update for the VP of Operations to send to regional warehouse managers. Situation: our barcode scanner vendor, North Quay Devices, delayed shipment of 420 replacement units from May 12 to May 19 because of a failed battery certification batch. We have enough spare scanners to cover only the Memphis and Reno sites; Tulsa and Allentown will need to share devices for one week. Ask managers to pause nonessential inventory recounts, prioritize outbound picking, and send daily shortage counts by 4 p.m. local time. Tone: calm, accountable, and practical. Length: 140–180 words.
Winner: DeepSeek: DeepSeek V4 Pro — Model A better adheres to the prompt by directly specifying 'send daily shortage counts by 4 p.m. local time' to the VP without adding unprompted details like shift handoffs or escalation instructions, while maintaining a perfectly calm, accountable, and practical tone. Model B introduces minor extras and shifts the recipient to 'Operations Planning,' slightly deviating from instructions, though both are high-quality and within word limits.
3. meeting-notes-summary
Read the meeting notes below, then provide: 1) a 2-sentence summary 2) a JSON object with keys
launch_date,owner,blocked_by,open_questions(array), anddecisions(array) Meeting notes: - Project: Cedar Lane tenant portal refresh - Maya said legal approved the new lease-upload wording after changing “instant approval” to “faster review.” - Andre confirmed the frontend is done except for the maintenance banner behavior on iPad Mini. - Priya wants launch on 2026-03-18, but only if payment autofill passes final QA by the 14th. - Blocker: finance sandbox is still returning duplicate receipt IDs for ACH retries. - Decision: remove dark mode from this release and revisit in Q3. - Decision: keep SMS login, but make email login the default option. - Open question: should users be able to delete stored bank accounts without calling support? - Open question: do we localize the late-fee explainer for Quebec French now or after launch? - Owner for launch checklist: Priya.
Winner: DeepSeek: DeepSeek V4 Pro — A follows the requested schema exactly and provides a clear 2-sentence summary plus correctly typed JSON fields. B’s summary is good, but its JSON does not adhere to the specified structure: launch_date includes extra conditional text and blocked_by is an array instead of a single value.
4. messy-orders-to-json
Convert the messy order lines below into valid JSON as an array of objects. Use exactly this schema for each object and preserve input order:
{"order_id": string, "customer": string, "items": [{"sku": string, "qty": integer}], "priority": boolean, "ship_by": string|null}Rules: - Normalizepriorityto true/false. - Normalize missing ship date words likenone,tbd,-to null. - Trim spaces around values. -itemsare separated by;and each item isSKU xQTY. Data: Order=QX-1042 | customer: Larkspur Clinic | items: AB-9 x2; TT-41 x1 | priority=YES | ship_by=2026-07-02 Order=QX-1043|customer: Mira & Son Catering|items: P-88 x12 |priority=no|ship_by=none Order = QX-1044 | customer: Oak Route Studio | items: ZK-2 x3; MN-7 x5; MN-7 x1 | priority = true | ship_by = TBD Order=QX-1045 | customer: Heliotrope Labs | items: R1 x1 | priority = false | ship_by = 2026-07-05
Winner: Tie — Both outputs are valid JSON, preserve input order, match the required schema exactly, and correctly normalize priority and ship_by values. There are no substantive differences in quality or correctness between them.
See every prompt and the full side-by-side outputs in the interactive Head-to-Head.
Originally published on Hacker News (Best)


