Skip to content

๐Ÿ“ฅ Precision Data Ingestion Flow โ€‹

WealthFam features a sophisticated, multi-channel ingestion pipeline designed to minimize manual entry while maintaining 100% data accuracy through idempotent hashing and tiered parsing logic.


โšก Multi-Channel Ingestion โ€‹

Our system captures data from three primary streams:

  1. Mobile SMS Ingestion: Real-time capture of bank transaction alerts directly from Android background listeners.
  2. Email IMAP Sync: Periodic scanning of authorized email accounts for bank statements and CAMS PDFs.
  3. Manual Web Ingestion: High-fidelity bulk uploads via CSV, Excel, or PDF files through the desktop dashboard.

๐Ÿ—๏ธ The Pipeline Lifecycle โ€‹

mermaid
sequenceDiagram
    participant Raw as Raw Source (SMS/Email/Web)
    participant Dedup as Deduplication (MD5)
    participant Filter as Logic Filter
    participant Engine as Tiered Parser Engine
    participant LLM as Gemini AI Fallback
    participant Ledger as Confirmed Ledger

    Raw->>Dedup: Input Raw String/File
    Dedup->>Dedup: Generate Unique ID (MD5)
    Dedup-->>Raw: Reject if ID exists (Idempotency)
    
    Dedup->>Filter: Pass Unique Payload
    Filter->>Filter: Apply Ignore Patterns (e.g., OTPs, Spam)
    
    Filter->>Engine: Structured Data Stream
    
    alt Static Match
        Engine->>Engine: Match Static Regex/Template
    else Pattern Guessing
        Engine->>Engine: Heuristic Inference
    else AI Parsing
        Engine->>LLM: Request Unstructured Extraction
        LLM-->>Engine: Structured JSON Response
    end

    Engine->>Ledger: Commit Synchronized Transaction

๐Ÿฆ CAMS/KFintech Lifecycle โ€‹

Our investment sync handles the complex extraction from bank-issued PDFs.

mermaid
graph TD
    Email[IMAP Sync Engine] -->|Find Attachments| PDF[CAMS/KFintech PDF]
    PDF -->|Secure Password| Extraction[Precision Extraction Microservice]
    Extraction -->|Mask PII| Anonymized[Anonymized JSON]
    Anonymized -->|Mapping| Positions[Investment Positions]
    Positions -->|XIRR Engine| UI[Dashboard Portfolio View]

๐Ÿ“ฑ Real-Time SMS Triage โ€‹

How we handle incoming bank alerts with varying degrees of certainty.

mermaid
sequenceDiagram
    participant App as Android Listener
    participant Parser as Ingestion MS
    participant Queue as Triage Queue
    participant Ledger as Ledger DB

    App->>Parser: Encrypted SMS + GPS
    Parser->>Parser: Confidence Check
    alt High Confidence (>95%)
        Parser->>Ledger: Auto-Commit Transaction
        Ledger-->>App: WebSocket Push (Confirmed)
    else Low Confidence
        Parser->>Queue: Flag for Review
        Queue-->>App: Notification (Needs Review)
    end

๐Ÿ›ก๏ธ Key Technical Features โ€‹

1. Idempotency (MD5 Hashing) โ€‹

Every ingestion attempt generates a unique hash based on the raw payload, timestamp, and source metadata. This ensures that the same SMS or email is never processed twice, even if the sync service restarts.

2. Tiered Parsing Logic โ€‹

To maximize speed and minimize costs, we use a tiered approach:

  • Tier 1 (Static): Hardcoded templates for major banks (HDFC, ICICI, etc.).
  • Tier 2 (Heuristics): Algorithmic identification of currency, amounts, and vendors.
  • Tier 3 (AI Fallback): Real-time inference using Gemini Pro for highly complex or non-standard messages.

3. Precision Deduplication โ€‹

Beyond simple IDs, WealthFam uses "Fuzzy Time-Windowing" to identify duplicate transactions across different sources (e.g., an SMS alert and a bank statement entry for the same coffee purchase).


*Accuracy Above All ยท Zero Manual Entry ยท Precision Integrity*

WealthFam Engineering Hub