Eight weeks from first call to live product. A three-person team. A fintech space that's crowded, regulated, and unforgiving of poor UX. Here's how Finly went from idea to launch — and what we learned along the way.
The Brief
The client came to us with a clear problem statement: personal finance apps are either too simple (they just show you a graph of your spending) or too complex (accountant-grade tools most people don't understand). They wanted something in the middle — an app that could answer plain-English questions about your money.
Core requirement: "Ask me anything about my finances" — a conversational AI layer on top of personal finance data.
Secondary requirements:
- Bank account connection (read-only)
- Automatic transaction categorisation
- Budget tracking with smart alerts
- Monthly summary and insights
- Mobile-first (iOS and Android)
Week 1–2: Scoping and Architecture
We spent the first two weeks on the things that save weeks later: clear scope, technical architecture, and a working skeleton.
Scope decisions we made early:
- Read-only bank data only. No payments, no transfers. This halved the compliance surface area.
- One country, one currency. Internationalisation deferred to v2.
- Web app first, native shell second. React Native WebView wrapper meant one codebase with a native feel.
Architecture choices:
- Backend: Node.js + PostgreSQL on GCP Cloud Run (serverless, scales to zero)
- Bank connectivity: Plaid API (fastest path to real bank data)
- AI layer: Gemini 1.5 Flash with a structured system prompt and user's transaction history as context
- Mobile: React Native with Expo
- Auth: Supabase (handles auth + row-level security out of the box)
Week 3–4: Core Data Pipeline
The heart of Finly is the data pipeline: connect bank → fetch transactions → categorise → store → make queryable.
Transaction categorisation was where we spent the most time. Bank transaction descriptions are notoriously messy — "AMZN*AB1234" isn't obviously Amazon to a naive classifier. We used a two-step approach:
- Rule-based matching for known merchant patterns (covers ~70% of transactions)
- LLM classification for the remaining 30%, with the result cached against the merchant string
This kept per-user AI costs minimal while achieving high categorisation accuracy.
Week 5–6: The AI Chat Feature
The headline feature — natural language queries over your finances — needed to feel instant and accurate.
The architecture: On each query, we inject the last 90 days of transactions (summarised, not raw — to fit in context) plus account balances into the system prompt. The model then has full context to answer questions like:
- "How much did I spend on food last month?"
- "Am I on track with my budget this month?"
- "What's my biggest expense category?"
- "Did I spend more in January or February?"
The hard part: Keeping responses grounded. LLMs will confidently answer questions they don't have data for. We added strict grounding instructions — if the model doesn't have data to answer precisely, it says so. No hallucinated numbers.
In fintech, a wrong number is worse than no number. We'd rather the AI say "I don't have enough data" than give an incorrect figure.
We also streamed all responses — first-token latency was consistently under 400ms, which made the chat feel snappy.
Week 7: Polish and Performance
Week 7 was entirely polish. Performance, edge cases, error states, empty states, loading skeletons. This is the week most projects skip — and it shows.
Key things we fixed:
- Cold start on Cloud Run was causing 2–3 second delays for the first request. Fixed with minimum instances set to 1.
- Transaction sync was running synchronously in-request. Moved to a background job with a webhook to notify the app when complete.
- Budget alert notifications were being sent at the wrong time. Added timezone-aware scheduling.
Week 8: Launch
App Store review, Play Store review, production infrastructure check, load test (simulated 500 concurrent users — Cloud Run handled it with no issues), and go-live.
Launch day metrics: 847 sign-ups in the first 24 hours. Zero critical errors. One minor bug caught by a user (a timezone edge case in budget resets — fixed within 2 hours).
What We'd Do Differently
- Start load testing earlier. We found the Cloud Run cold start issue in week 7. We could have caught it in week 3.
- Invest in a design system sooner. We rebuilt several components late in the project when we could have had consistent primitives from day one.
- More eval examples for the AI. We launched with 50 golden test examples. More would have caught a few edge cases we found post-launch.
The Stack Summary
Frontend: React Native + ExpoBackend: Node.js + Express on GCP Cloud RunDatabase: PostgreSQL via SupabaseAuth: Supabase AuthBank data: Plaid APIAI: Gemini 1.5 FlashInfra: GCP (Cloud Run, Cloud SQL, Pub/Sub)CI/CD: GitHub Actions → Cloud Run
Building something similar? Get in touch — we'd love to talk through your product.