← Back to Blog Case Study

Case Study: Launching Finly — an AI Finance App

20 March 2025 · 9 min read · NotchAI Team

Eight weeks from first call to live product. A three-person team. A fintech space that's crowded, regulated, and unforgiving of poor UX. Here's how Finly went from idea to launch — and what we learned along the way.

The Brief

The client came to us with a clear problem statement: personal finance apps are either too simple (they just show you a graph of your spending) or too complex (accountant-grade tools most people don't understand). They wanted something in the middle — an app that could answer plain-English questions about your money.

Core requirement: "Ask me anything about my finances" — a conversational AI layer on top of personal finance data.

Secondary requirements:

Bank account connection (read-only)
Automatic transaction categorisation
Budget tracking with smart alerts
Monthly summary and insights
Mobile-first (iOS and Android)

Week 1–2: Scoping and Architecture

We spent the first two weeks on the things that save weeks later: clear scope, technical architecture, and a working skeleton.

Scope decisions we made early:

Read-only bank data only. No payments, no transfers. This halved the compliance surface area.
One country, one currency. Internationalisation deferred to v2.
Web app first, native shell second. React Native WebView wrapper meant one codebase with a native feel.

Architecture choices:

Backend: Node.js + PostgreSQL on GCP Cloud Run (serverless, scales to zero)
Bank connectivity: Plaid API (fastest path to real bank data)
AI layer: Gemini 1.5 Flash with a structured system prompt and user's transaction history as context
Mobile: React Native with Expo
Auth: Supabase (handles auth + row-level security out of the box)

Week 3–4: Core Data Pipeline

The heart of Finly is the data pipeline: connect bank → fetch transactions → categorise → store → make queryable.

Transaction categorisation was where we spent the most time. Bank transaction descriptions are notoriously messy — "AMZN*AB1234" isn't obviously Amazon to a naive classifier. We used a two-step approach:

Rule-based matching for known merchant patterns (covers ~70% of transactions)
LLM classification for the remaining 30%, with the result cached against the merchant string

This kept per-user AI costs minimal while achieving high categorisation accuracy.

Week 5–6: The AI Chat Feature

The headline feature — natural language queries over your finances — needed to feel instant and accurate.

The architecture: On each query, we inject the last 90 days of transactions (summarised, not raw — to fit in context) plus account balances into the system prompt. The model then has full context to answer questions like:

"How much did I spend on food last month?"
"Am I on track with my budget this month?"
"What's my biggest expense category?"
"Did I spend more in January or February?"

The hard part: Keeping responses grounded. LLMs will confidently answer questions they don't have data for. We added strict grounding instructions — if the model doesn't have data to answer precisely, it says so. No hallucinated numbers.

In fintech, a wrong number is worse than no number. We'd rather the AI say "I don't have enough data" than give an incorrect figure.

We also streamed all responses — first-token latency was consistently under 400ms, which made the chat feel snappy.

Week 7: Polish and Performance

Week 7 was entirely polish. Performance, edge cases, error states, empty states, loading skeletons. This is the week most projects skip — and it shows.

Key things we fixed:

Cold start on Cloud Run was causing 2–3 second delays for the first request. Fixed with minimum instances set to 1.
Transaction sync was running synchronously in-request. Moved to a background job with a webhook to notify the app when complete.
Budget alert notifications were being sent at the wrong time. Added timezone-aware scheduling.

Week 8: Launch

App Store review, Play Store review, production infrastructure check, load test (simulated 500 concurrent users — Cloud Run handled it with no issues), and go-live.

Launch day metrics: 847 sign-ups in the first 24 hours. Zero critical errors. One minor bug caught by a user (a timezone edge case in budget resets — fixed within 2 hours).

What We'd Do Differently

Start load testing earlier. We found the Cloud Run cold start issue in week 7. We could have caught it in week 3.
Invest in a design system sooner. We rebuilt several components late in the project when we could have had consistent primitives from day one.
More eval examples for the AI. We launched with 50 golden test examples. More would have caught a few edge cases we found post-launch.

The Stack Summary

Frontend: React Native + Expo
Backend: Node.js + Express on GCP Cloud Run
Database: PostgreSQL via Supabase
Auth: Supabase Auth
Bank data: Plaid API
AI: Gemini 1.5 Flash
Infra: GCP (Cloud Run, Cloud SQL, Pub/Sub)
CI/CD: GitHub Actions → Cloud Run

Building something similar? Get in touch — we'd love to talk through your product.

← Back to Blog Work with us