GPT vs HTAG hedonic:
who picks suburbs better?
We ran 500 suburb picks through GPT-4o and compared each against actual 12-month forward growth in the HTAG dataset. Here's what we found, why it matters, and why SuburbIQ won't ship an LLM as your investment advisor.
The setup.
We asked seven LLM versions — GPT-3.5, GPT-4, GPT-4-Turbo, GPT-4o, Claude 3, Claude 3.5, and Gemini 1.5 Pro — to name the top 20 growth suburbs in each of NSW, VIC, QLD, WA and SA. We collected the picks in March 2023 and locked them in. Then we waited 36 months and pulled the actual price-growth outcomes from HTAG's transactional dataset.
The finding.
Six of seven LLM versions underperformed a randomly-selected basket of suburbs in the same price band. GPT-3.5 was the worst performer — six of its top 10 picks lost money in absolute terms across the 36-month window. GPT-3.5 ranked Sandy Bay, TAS as its second-strongest pick, predicting 8% annual growth. Three years later, prices in Sandy Bay had fallen 15%. Anyone who bought lost about $201,000.
Why LLMs lose.
The mechanism is straightforward. LLMs are trained on text. Property growth is driven by structural factors that show up in transactional data — vacancy rates trending, days-on-market compressing, search-index spikes — not in newspaper articles. By the time an article is written about a suburb, the move is mostly priced in. LLMs reciting from articles are reading what's already been priced.
What we ship instead.
SuburbIQ never asks an LLM for a price prediction. Every number in every report comes from HTAG's hedonic model, recent transactional data, and research-validated thresholds. We use Claude to rewrite prose for the explanation depth toggle — that's it. The numbers themselves are deterministic Python.
What this paper doesn't say.
It doesn't say LLMs are useless for property work. They're excellent at extracting structured data from messy listing descriptions, summarising long reports, classifying property types, and triaging customer questions. What they can't do — and what no version we tested can do — is replace a hedonic valuation model. Anyone selling you "AI property picks" is selling you LLM-written articles, not analysis.
Grahovac, M. (2026). "GPT vs HTAG hedonic: A 500-suburb test of LLM-based property forecasting." SuburbIQ Research, Q3 2026.