๐ฅ Precision Data Ingestion Flow โ
WealthFam features a sophisticated, multi-channel ingestion pipeline designed to minimize manual entry while maintaining 100% data accuracy through idempotent hashing and tiered parsing logic.
โก Multi-Channel Ingestion โ
Our system captures data from three primary streams:
- Mobile SMS Ingestion: Real-time capture of bank transaction alerts directly from Android background listeners.
- Email IMAP Sync: Periodic scanning of authorized email accounts for bank statements and CAMS PDFs.
- Manual Web Ingestion: High-fidelity bulk uploads via CSV, Excel, or PDF files through the desktop dashboard.
๐๏ธ The Pipeline Lifecycle โ
sequenceDiagram
participant Raw as Raw Source (SMS/Email/Web)
participant Dedup as Deduplication (MD5)
participant Filter as Logic Filter
participant Engine as Tiered Parser Engine
participant LLM as Gemini AI Fallback
participant Ledger as Confirmed Ledger
Raw->>Dedup: Input Raw String/File
Dedup->>Dedup: Generate Unique ID (MD5)
Dedup-->>Raw: Reject if ID exists (Idempotency)
Dedup->>Filter: Pass Unique Payload
Filter->>Filter: Apply Ignore Patterns (e.g., OTPs, Spam)
Filter->>Engine: Structured Data Stream
alt Static Match
Engine->>Engine: Match Static Regex/Template
else Pattern Guessing
Engine->>Engine: Heuristic Inference
else AI Parsing
Engine->>LLM: Request Unstructured Extraction
LLM-->>Engine: Structured JSON Response
end
Engine->>Ledger: Commit Synchronized Transaction๐ฆ CAMS/KFintech Lifecycle โ
Our investment sync handles the complex extraction from bank-issued PDFs.
graph TD
Email[IMAP Sync Engine] -->|Find Attachments| PDF[CAMS/KFintech PDF]
PDF -->|Secure Password| Extraction[Precision Extraction Microservice]
Extraction -->|Mask PII| Anonymized[Anonymized JSON]
Anonymized -->|Mapping| Positions[Investment Positions]
Positions -->|XIRR Engine| UI[Dashboard Portfolio View]๐ฑ Real-Time SMS Triage โ
How we handle incoming bank alerts with varying degrees of certainty.
sequenceDiagram
participant App as Android Listener
participant Parser as Ingestion MS
participant Queue as Triage Queue
participant Ledger as Ledger DB
App->>Parser: Encrypted SMS + GPS
Parser->>Parser: Confidence Check
alt High Confidence (>95%)
Parser->>Ledger: Auto-Commit Transaction
Ledger-->>App: WebSocket Push (Confirmed)
else Low Confidence
Parser->>Queue: Flag for Review
Queue-->>App: Notification (Needs Review)
end๐ก๏ธ Key Technical Features โ
1. Idempotency (MD5 Hashing) โ
Every ingestion attempt generates a unique hash based on the raw payload, timestamp, and source metadata. This ensures that the same SMS or email is never processed twice, even if the sync service restarts.
2. Tiered Parsing Logic โ
To maximize speed and minimize costs, we use a tiered approach:
- Tier 1 (Static): Hardcoded templates for major banks (HDFC, ICICI, etc.).
- Tier 2 (Heuristics): Algorithmic identification of currency, amounts, and vendors.
- Tier 3 (AI Fallback): Real-time inference using Gemini Pro for highly complex or non-standard messages.
3. Precision Deduplication โ
Beyond simple IDs, WealthFam uses "Fuzzy Time-Windowing" to identify duplicate transactions across different sources (e.g., an SMS alert and a bank statement entry for the same coffee purchase).
