← Back to Blog

Case Study: Launching Finly — an AI Finance App

Eight weeks from first call to live product. A three-person team. A fintech space that's crowded, regulated, and unforgiving of poor UX. Here's how Finly went from idea to launch — and what we learned along the way.

The Brief

The client came to us with a clear problem statement: personal finance apps are either too simple (they just show you a graph of your spending) or too complex (accountant-grade tools most people don't understand). They wanted something in the middle — an app that could answer plain-English questions about your money.

Core requirement: "Ask me anything about my finances" — a conversational AI layer on top of personal finance data.

Secondary requirements:

  • Bank account connection (read-only)
  • Automatic transaction categorisation
  • Budget tracking with smart alerts
  • Monthly summary and insights
  • Mobile-first (iOS and Android)

Week 1–2: Scoping and Architecture

We spent the first two weeks on the things that save weeks later: clear scope, technical architecture, and a working skeleton.

Scope decisions we made early:

  • Read-only bank data only. No payments, no transfers. This halved the compliance surface area.
  • One country, one currency. Internationalisation deferred to v2.
  • Web app first, native shell second. React Native WebView wrapper meant one codebase with a native feel.

Architecture choices:

  • Backend: Node.js + PostgreSQL on GCP Cloud Run (serverless, scales to zero)
  • Bank connectivity: Plaid API (fastest path to real bank data)
  • AI layer: Gemini 1.5 Flash with a structured system prompt and user's transaction history as context
  • Mobile: React Native with Expo
  • Auth: Supabase (handles auth + row-level security out of the box)

Week 3–4: Core Data Pipeline

The heart of Finly is the data pipeline: connect bank → fetch transactions → categorise → store → make queryable.

Transaction categorisation was where we spent the most time. Bank transaction descriptions are notoriously messy — "AMZN*AB1234" isn't obviously Amazon to a naive classifier. We used a two-step approach:

  1. Rule-based matching for known merchant patterns (covers ~70% of transactions)
  2. LLM classification for the remaining 30%, with the result cached against the merchant string

This kept per-user AI costs minimal while achieving high categorisation accuracy.

Week 5–6: The AI Chat Feature

The headline feature — natural language queries over your finances — needed to feel instant and accurate.

The architecture: On each query, we inject the last 90 days of transactions (summarised, not raw — to fit in context) plus account balances into the system prompt. The model then has full context to answer questions like:

  • "How much did I spend on food last month?"
  • "Am I on track with my budget this month?"
  • "What's my biggest expense category?"
  • "Did I spend more in January or February?"

The hard part: Keeping responses grounded. LLMs will confidently answer questions they don't have data for. We added strict grounding instructions — if the model doesn't have data to answer precisely, it says so. No hallucinated numbers.

In fintech, a wrong number is worse than no number. We'd rather the AI say "I don't have enough data" than give an incorrect figure.

We also streamed all responses — first-token latency was consistently under 400ms, which made the chat feel snappy.

Week 7: Polish and Performance

Week 7 was entirely polish. Performance, edge cases, error states, empty states, loading skeletons. This is the week most projects skip — and it shows.

Key things we fixed:

  • Cold start on Cloud Run was causing 2–3 second delays for the first request. Fixed with minimum instances set to 1.
  • Transaction sync was running synchronously in-request. Moved to a background job with a webhook to notify the app when complete.
  • Budget alert notifications were being sent at the wrong time. Added timezone-aware scheduling.

Week 8: Launch

App Store review, Play Store review, production infrastructure check, load test (simulated 500 concurrent users — Cloud Run handled it with no issues), and go-live.

Launch day metrics: 847 sign-ups in the first 24 hours. Zero critical errors. One minor bug caught by a user (a timezone edge case in budget resets — fixed within 2 hours).

What We'd Do Differently

  • Start load testing earlier. We found the Cloud Run cold start issue in week 7. We could have caught it in week 3.
  • Invest in a design system sooner. We rebuilt several components late in the project when we could have had consistent primitives from day one.
  • More eval examples for the AI. We launched with 50 golden test examples. More would have caught a few edge cases we found post-launch.

The Stack Summary

  • Frontend: React Native + Expo
  • Backend: Node.js + Express on GCP Cloud Run
  • Database: PostgreSQL via Supabase
  • Auth: Supabase Auth
  • Bank data: Plaid API
  • AI: Gemini 1.5 Flash
  • Infra: GCP (Cloud Run, Cloud SQL, Pub/Sub)
  • CI/CD: GitHub Actions → Cloud Run

Building something similar? Get in touch — we'd love to talk through your product.