# AskEngine — Chrome Extension

**Turn any website's search into a conversation.**

A Chrome extension that injects a floating AI-powered search widget onto any website. Users describe what they're looking for in natural language, and AskEngine reads the page content to find matches.

---

## Installation

1. Open Chrome → `chrome://extensions`
2. Enable **Developer mode** (top-right toggle)
3. Click **Load unpacked** → select this folder
4. Click the AskEngine icon in the toolbar → **Options** to add your Anthropic API key
5. Visit any website — the green ⚡ bubble appears in the bottom-right

**Keyboard shortcut:** `Ctrl+Shift+K` (or `⌘+Shift+K` on Mac) toggles the widget.

---

## Architecture

```
┌─────────────────────────────────────────────┐
│              ANY WEBSITE                     │
│                                              │
│  ┌──────────────────────────────────────┐   │
│  │        Content Script (content.js)    │   │
│  │                                       │   │
│  │  ┌─────────────┐  ┌───────────────┐  │   │
│  │  │  Extractor   │  │  Widget UI    │  │   │
│  │  │  Engine      │  │  (Shadow DOM) │  │   │
│  │  │              │  │               │  │   │
│  │  │ • JSON-LD    │  │ • Chat panel  │  │   │
│  │  │ • Meta tags  │  │ • Suggestions │  │   │
│  │  │ • Adapters   │  │ • Results     │  │   │
│  │  │ • Semantic   │  │ • Input       │  │   │
│  │  │ • Generic    │  │               │  │   │
│  │  │ • Shadow DOM │  │               │  │   │
│  │  │ • Tables     │  │               │  │   │
│  │  └──────┬───────┘  └───────┬───────┘  │   │
│  │         │   Page Context    │ Query    │   │
│  └─────────┼──────────────────┼──────────┘   │
│            │                  │               │
└────────────┼──────────────────┼───────────────┘
             │                  │
    ┌────────▼──────────────────▼────────┐
    │    Background Service Worker        │
    │    (background.js)                  │
    │                                     │
    │  • System prompt construction       │
    │  • Context compression              │
    │  • Anthropic API calls              │
    │  • Response parsing                 │
    │  • Rate limiting                    │
    │  • Settings management              │
    └────────────────┬────────────────────┘
                     │
                     ▼
           ┌─────────────────┐
           │  Anthropic API   │
           │  (Claude)        │
           └─────────────────┘
```

---

## Extraction Pipeline (Priority Order)

The extractor uses a **layered strategy**, checking each method in order and merging results:

### Layer 1: JSON-LD Structured Data
- Parses `<script type="application/ld+json">` tags
- Handles schema.org Product, ItemList, Organization, etc.
- Extracts Microdata (`itemscope`/`itemprop` attributes)
- Extracts RDFa (`typeof`/`property` attributes)
- **Confidence: Highest** — pre-structured by site owners for SEO

### Layer 2: Meta Tags
- Open Graph (`og:title`, `og:description`, etc.)
- Twitter Cards
- Standard meta description/keywords
- Article-specific meta tags

### Layer 3: Platform-Specific Adapters
Custom extraction logic for:
- **E-commerce:** Amazon, eBay, Etsy, Shopify, WooCommerce, Walmart, Target, Best Buy
- **Real Estate:** Zillow, Realtor.com, Redfin
- **Jobs:** Indeed, LinkedIn, Glassdoor
- **Food:** Yelp, DoorDash, UberEats, Grubhub
- **Content:** Reddit, Wikipedia, Medium, GitHub, StackOverflow
- **Travel:** Booking.com, Airbnb
- **CMS Detection:** WordPress, Squarespace, Wix

### Layer 4: Semantic HTML Heuristics
- Searches for `article`, `[role="article"]`, `.card`, `.product`, `.listing`, `.result`, etc.
- Finds the selector that returns the most **structurally similar** set of visible elements
- Extracts: name, price, image, link, rating, description, tags, location

### Layer 5: Generic Listing Detection
- Finds repeated sibling elements in containers (`ul`, `ol`, `.grid`, `main`)
- Structural similarity checking (child tag signatures)
- Table extraction (headers → row data mapping)

### Layer 6: Main Content Text
- For non-listing pages (articles, docs, blog posts)
- Priority: `<main>` → `<article>` → `#content` → largest text block
- Cleans: removes nav, footer, ads, cookie banners, modals

### Layer 7: Navigation Context
- Breadcrumbs
- Active filters
- Pagination state
- Result counts
- Page heading

---

## Edge Cases Handled

### Dynamic Content / SPAs
- **MutationObserver** watches for significant DOM changes and re-extracts
- **SPA navigation detection** intercepts `pushState`/`replaceState`/`popstate`
- Debounced re-extraction (800ms settle time) prevents thrashing
- Full conversation reset on navigation

### Shadow DOM
- Recursive traversal of shadow roots (up to 5 levels deep)
- Extracts semantic items from within shadow boundaries
- Handles Web Components that encapsulate product cards

### Anti-Scraping Resilience
- Doesn't rely on specific CSS class names (handles Tailwind, CSS modules, obfuscated names)
- Uses **structural similarity** algorithms instead of class-based selection
- Falls back through 7 layers if any method fails
- Currency detection via regex (handles $, €, £, ¥, ₹)

### Content Types
- Product listings → item extraction
- Articles/blog posts → content summarization
- Documentation → Q&A
- Tables → structured row extraction
- Image galleries → alt text + metadata
- Mixed content → hybrid extraction

### Style Isolation
- Widget renders inside **Shadow DOM** — zero CSS conflicts with host page
- No global styles leak in or out
- Works on sites with aggressive CSS resets

### Security
- API key stored in `chrome.storage.sync` (encrypted by Chrome)
- Key never exposed to page JavaScript (isolated in service worker)
- Content script runs in isolated world
- No eval(), no remote code execution

### Performance
- Extraction typically completes in <100ms
- Lazy extraction (only on widget open)
- Token-aware compression for API calls (keeps context under 4K tokens)
- Items capped at 30 per query to manage costs

---

## File Structure

```
askengine-extension/
├── manifest.json          # Extension manifest (V3)
├── background.js          # Service worker — API calls, settings
├── content.js             # Content script — widget UI, user interaction
├── extractor.js           # Extraction engine — DOM scraping, adapters
├── options.html           # Settings page
├── styles/
│   └── widget.css         # Placeholder (styles in Shadow DOM)
├── icons/
│   ├── icon16.png
│   ├── icon48.png
│   ├── icon128.png
│   └── logo.png           # AskEngine logo
└── README.md              # This file
```

---

## Upgrade Path → AskEngine SDK (Business)

The extension demonstrates AskEngine's capability on any site via DOM scraping. For businesses wanting **dramatically better results**, the SDK product offers:

| Feature | Extension (DOM) | SDK (Custom API) |
|---------|----------------|------------------|
| Data source | Real-time DOM scraping | Hardcoded product catalog |
| Accuracy | Good (page-level) | Exceptional (full catalog) |
| Cross-page search | Current page only | Entire inventory |
| Branding | AskEngine branded | Fully white-label |
| Analytics | None | Full conversion dashboard |

Businesses contact us → we ingest their data → build a custom Claude-powered API → they embed one script tag.

---

## License

Proprietary — AskEngine © 2025
