Shopville / Milk Mocha Bear is an e-commerce reseller operating across 14+ channels. Raw transactional data is pulled from 6 external source systems into Snowflake, transformed into analytics views and materialized tables, and served to Looker Studio via BigQuery. All ETL runs on a self-hosted RHEL 9 server (10.10.1.158) via cron.
Looker Studio's community Snowflake connector adds 12–15 seconds per query due to OAuth overhead. BigQuery is Looker's native engine — filter changes resolve in under 1 second. Nightly sync (12:00 UTC) keeps data fresh enough for all reporting use cases.
GitHub Actions hit its monthly minutes cap in May 2026. All 19 ETL jobs were migrated to server cron. GHA workflows are retained for manual re-runs only (workflow_dispatch). Snowflake Tasks are all SUSPENDED — cron is the sole scheduler.
The Schema Browser tab queries INFORMATION_SCHEMA.COLUMNS from Snowflake every hour — new tables and column changes appear automatically. The written descriptions, architecture notes, and limitation details in this page are hardcoded in TABLE_DESCRIPTIONS and SOURCE_DOCS inside monitor.py. When you add a new table or change a design decision, update those dicts and restart the monitor.
Every marketplace (Amazon, Walmart, eBay) requires unique listing SKUs that differ from Acumatica's canonical SV###### format. The normalization chain runs in every analytics view:
Kit SKUs start with SV9 (e.g. SV9145264CAF). They resolve to their component via T_KIT_COMPONENTS. Used items carry suffixes UA/UVG/ULN/UOB/UDB/UG and are excluded from all inventory and sales views.
All P&L calculations use this fallback chain. ~94.8% of order lines are covered by priority 1:
| # | Source | Fallback Condition | Notes |
|---|---|---|---|
| 1 | T_SALES_VELOCITY.total_cost_cad | — | Acumatica accounting cost, most accurate. ~94.8% coverage. |
| 2 | PO-derived avg landed cost | If SV1 is 0 or NULL | Derived from T_PURCHASE_ORDER_LINES by SKU. |
| 3 | T_STOCK_ITEM_COST.average_cost_cad | If SV1+PO unavailable | Acumatica average cost GI. Preferred over last cost — prevents inflation from one-off purchases. |
| 4 | T_STOCK_ITEM_COST.last_cost_cad | Last resort | Avoid — can be inflated by single high-cost purchase. |
Every order in T_SALES_VELOCITY has two Acumatica fields: customer_name and celigo_fulfillment_channel. All analytics views derive the market from these.
| Condition | Market Label | Notes |
|---|---|---|
celigo_fulfillment_channel IN ('AFN','FBA')+ customer_name ends with .ca | Amazon CA (FBA) | Canadian marketplace fulfilled by Amazon |
FBA + customer_name contains Amazon MX | Amazon MX (NARF) | NARF — fulfilled from US FBA inventory, MXN→CAD |
FBA + customer_name contains Amazon UK | Amazon UK (FBA) | EU marketplace, GBP→CAD |
FBA + customer_name contains Amazon JP | Amazon JP (FBA) | Japan, JPY→CAD |
| FBA + else | Amazon US (FBA) | Default FBA = US marketplace |
celigo_fulfillment_channel = 'WFS' | Walmart | Walmart Fulfillment Services |
MFN + customer_name starts with Amazon | Amazon MFN | Merchant-fulfilled Amazon order (we ship it) |
MFN + customer_name contains Shopville (CAN) or Shopville Select | Shopify | Shopify storefront orders (Celigo integration) |
MFN + customer_name starts with Ebay | eBay | eBay direct or ChannelEngine orders |
MFN + customer_name starts with Walmart | Walmart MFN | Walmart marketplace, merchant-fulfilled |
MFN + customer_name starts with Best Buy | Best Buy US / CA | 15% flat fee in V_ORDER_DETAIL |
MFN + customer_name starts with Target | Target | 15% flat fee in V_ORDER_DETAIL |
MFN + customer_name starts with Macy | Macy's | 15% flat fee in V_ORDER_DETAIL |
customer_name, not the fulfillment channel code.Used in V_STOCK_DETAIL, V_INVENTORY_PLAN, and V_STOCK_DETAIL_BY_WAREHOUSE.
| Code | Type | Description |
|---|---|---|
200 | Physical | Main warehouse — primary active inventory |
200R | Physical | Main warehouse — reserve locations only. qty_available = 0 but has real value — always use total_value |
200T | Physical | TCGPlayer inventory. Same caveat: qty_available = 0 |
205 | Physical | Deprecated — no longer in active use |
207 | Physical | Complete Shipping Solutions warehouse |
Trophy Room | Physical (special) | High-value collectibles (~$1.44M CAD). Tracked separately as qty_trophy / trophy_value — not included in qty_avail_physical |
FBA-US | FBA | Amazon US fulfillment centers |
FBA-CA | FBA | Amazon Canada fulfillment centers |
FBA-UK | FBA | Amazon UK fulfillment centers |
AWD-US | AWD | Amazon Warehousing & Distribution — bulk storage upstream of FBA. ~79 SKUs, ~$118K CAD |
WFS-US | WFS | Walmart Fulfillment Services |
V_STOCK_DETAIL aggregated columns: qty_avail_physical = 200+200R+200T+205+207 combined. qty_res_* = reserved (committed to orders, not yet shipped). total_value_cad includes all locations except Trophy Room. Note: 200R (reserve) and 200T (TCGPlayer) show qty_available = 0 in SkuVault but hold real stock — total_value accounts for this.
Applies to T_PRODUCT_FLOW_BY_CATEGORY and T_PRODUCT_FLOW_BY_VENDOR. Each row = one category (or vendor) × one calendar month.
| Column | Meaning | Source |
|---|---|---|
open_po_cad | Purchase orders in Open status in Acumatica | T_PURCHASE_ORDER_LINES |
on_hold_po_cad | POs in On Hold status | T_PURCHASE_ORDER_LINES |
paid_po_cad | POs in Completed / Closed status | T_PURCHASE_ORDER_LINES |
| ⚠ Important: All PO statuses represent received goods. Acumatica POs are only entered after physical receipt, so Open + On Hold + Paid = total inventory brought in that month. | ||
revenue_cad | Net sales revenue from T_SALES_VELOCITY | T_SALES_VELOCITY |
cogs_cad | Cost of goods sold using the 4-priority COGS waterfall | T_SALES_VELOCITY → T_STOCK_ITEM_COST |
selling_fees_cad | Amazon Commission + Walmart "Commission on Product" + flat-rate fees | T_ORDER_FEES (Amazon + Walmart) |
fulfillment_fees_cad | FBA per-unit + per-order fees + Walmart WFS Item Fees | T_ORDER_FEES (Amazon + Walmart) |
shipping_cad | ShipStation label costs (MFN orders only — FBA/WFS shipping is Amazon/Walmart's cost) | F_LABELS_USA + F_LABELS_CANADA |
profit_cad | revenue − cogs − selling_fees − fulfillment_fees − shipping | Computed |
margin_pct | profit / revenue stored as decimal. Never SUM this in Looker. Use calculated field PROFIT_CAD / REVENUE_CAD instead. | Computed |
inv_gain_loss_cad | (open + on_hold + paid POs) − cogs. Positive = bought more than sold (inventory building). Negative = drawing down stock. | Computed |
| Caveat | Detail | Workaround |
|---|---|---|
| Amazon fee settlement lag | Amazon settles fees 5–25 days after shipment. Orders < 7 days old show 0% fee coverage (normal — fees haven't posted yet). Coverage by age: 0–7d = 0–6%, 8–14d = 12–89%, 15–20d = 65–91%, 21+ days = 94–98% ✅ | Filter to orders > 21 days old for reliable fee data. Do not interpret missing fees on recent orders as a pipeline failure. |
| Shopify fees excluded from net profit | T_SHOPIFY_FEES (Shopify Payments ~2.9% + $0.30/order) is not yet joined into V_ORDER_DETAIL or T_SALES_DETAIL. Shopify channel profit is slightly overstated. |
Known gap. Workaround: estimate 3% of Shopify revenue as unaccounted payment processing fees. |
| eBay ChannelEngine flat-rate | Shopville (CAN) eBay orders via ChannelEngine use internal numeric IDs (e.g. 6551727439957) that never match eBay Finances API short IDs (e.g. 27-14547-30595). These get a 14% flat-rate estimate in V_ORDER_DETAIL. |
Direct Ebay (USD) orders match at 100%. The 14% estimate is close to actual eBay fee rates. |
| AWD inventory $0 cost | ~30 AWD SKUs have no matching cost in T_STOCK_ITEM_COST. These show $0 value in V_INVENTORY_PLAN even though stock exists. |
Filter WHERE total_value > 0 if aggregating inventory value. The $0 rows represent real physical stock, just uncosted. |
| Move transactions in inventory history | SkuVault "Move" transactions are internal transfers between locations. When a move splits across midnight, it creates an artificial spike/dip in daily inventory value. These are explicitly excluded from REFRESH_INVENTORY_VALUE_HISTORY(). |
If you see a large single-day value change in inventory_value_history, check if it's a real change or a Move transaction edge case. |
| 200R / 200T qty_available = 0 | Returns (200R) and Transit (200T) locations show qty_available = 0 in SkuVault, but the stock has real value. Using available_value in V_INVENTORY_PLAN understates inventory. |
Always use total_value (not available_value) when summing inventory value. |
| Kit SKUs (SV9...) | SKUs starting with SV9 are multi-packs or bundles (e.g. SV9145264 = 2× SV145264). They have no PO history of their own — vendor and category are inherited from the component SKU via T_KIT_COMPONENTS. |
If a kit SKU shows no vendor, check T_KIT_COMPONENTS. Inferred kits have revision = 'INFERRED'. |
| Used-item SKU exclusions | SKUs with suffixes UA, UVG, ULN, UOB, UDB, UG (e.g. SV145264UA) are used/refurbished items. They are excluded from all inventory and sales analytics views. |
If you need used-item data, query T_SALES_VELOCITY directly without the used-item filter. |
| Decision | Why We Did It | Alternative Considered | Trade-off |
|---|---|---|---|
| Materialized tables for Looker | V_STOCK_DETAIL has 77 cols + complex joins — direct queries take 10–20s in Looker | Query Snowflake views directly | Data is 24h stale (refreshed nightly). Acceptable for reporting. |
| Server cron scheduler | GHA hit minutes cap May 2026. Self-hosted gives unlimited runtime + cheaper. | GitHub Actions, Snowflake Tasks, Airflow | Requires server uptime. Monitor at :8080 for visibility. |
| Snowflake Tasks: ALL SUSPENDED | Were duplicating server cron runs and burning credits | Tasks as sole scheduler | Snowflake EBUSY errors in sandbox made external API calls unreliable in Tasks anyway. |
| BigQuery sync nightly | Native Looker connector; sub-second filter response vs 12–15s Snowflake | Direct Snowflake Looker connection | Extra complexity in sync pipeline. Data is always 1 day behind. |
| REST API for Acumatica | OData GI has hard 365-day rolling window — couldn't load multi-year history | OData GI (still used for cost/POs) | REST API requires cookie session auth. ThreadPoolExecutor for parallel sub-partitions. |
| Two-tier Alt SKU loading | Full reload (~334K items, ~35 min) too slow for daily. Delta takes ~60s. | Full reload daily | Sunday full reload as safety net for deletions. Delta every 30 min keeps SKUs current. |
| QUALIFY dedup guard on T_SALES_VELOCITY | Apr 2026: --no-truncate flag on full-load script doubled 71.5K rows, overstated 2026 revenue by ~$5M | Trust source data uniqueness | Slight query overhead. Unique key: (order_type, order_nbr, line_nbr). |
Click any table to jump to its schema. All times UTC.
Views — computed on demand from source tables, no storage cost
Materialized tables — rebuilt nightly, faster Looker queries
Active in Looker Connected, 0 charts Synced, not in Looker yet
inventory_plan table (ds43) is the current replacement. The Apps Script source should be migrated off.
stock_detail (V_STOCK_DETAIL snapshot — SKU health, per-location qty)product_flow_by_category (monthly P&L, filter by METRIC_MONTH)product_flow_by_vendor (monthly P&L by supplier)po_lines_enriched (clickable Acumatica PO links)order_detail (rolling 30d · 14 channels · full fee breakdown)fba_inbound_pipeline + fba_pipeline_stagesinventory_value_history (daily value chart by category)All times UTC. Logs at /var/log/shopville/. Monitor at http://10.10.1.158:8080.
| UTC | Job | What it does | Target table(s) |
|---|---|---|---|
| 09:00 | load_skuvault | Warehouse inventory via Snowflake proc | SKUVAULT.CURATED.T_PRODUCT_WAREHOUSE_INVENTORY |
| 09:00 | load_amazon_order_fees | Amazon fees US/CA/UK/MX (incremental watermark) | AMAZON.CURATED.T_ORDER_FEES |
| 09:00 | load_kit_components | Kit BOM mappings (daily) | ACUMATICA.CURATED.T_KIT_COMPONENTS |
| 09:05–07 | load_acumatica_procs | Sales Velocity + Stock Cost + POs | T_SALES_VELOCITY, T_STOCK_ITEM_COST, T_PURCHASE_ORDER_LINES |
| every 30m | load_acumatica_alt_skus_delta | Delta alt-SKU update via REST (~60s) | ACUMATICA.CURATED.T_STOCK_ITEM_ALTS |
| Sun 03:00 | load_acumatica_alt_skus | Full alt-SKU reload (~35 min) | ACUMATICA.CURATED.T_STOCK_ITEM_ALTS |
| 09:30 | load_amazon_fba_inventory | FBA inventory US/CA/UK | AMAZON.CURATED.T_FBA_INVENTORY |
| 09:31 | load_amazon_awd_inventory | AWD inventory US | AMAZON.CURATED.T_AWD_INVENTORY |
| 09:45 | load_amazon_fba_shipments | FBA inbound shipments US/CA/UK | T_FBA_SHIPMENTS, T_FBA_SHIPMENT_STATUS_LOG |
| 10:00 | load_ebay_order_fees | eBay Sell Finances API US/CA | AMAZON.CURATED.T_EBAY_ORDER_FEES |
| 10:15 | load_walmart_wfs_inventory | WFS inventory US | WALMART.CURATED.T_WFS_INVENTORY |
| 10:20 | load_walmart_order_fees | Walmart reconciliation fees | WALMART.CURATED.T_ORDER_FEES |
| 10:30 | load_shipstation | Shipping labels USA + Canada | SHIPSTATION.CURATED.F_LABELS_USA / F_LABELS_CANADA |
| 11:00 | load_skuvault_transactions | SkuVault transaction history | SKUVAULT.CURATED.T_TRANSACTIONS |
| 11:00 | refresh_product_flow | Rebuild T_PRODUCT_FLOW tables | ANALYTICS.CURATED.T_PRODUCT_FLOW_BY_CATEGORY/VENDOR |
| 11:30 | refresh_sales_detail | Rebuild T_SALES_DETAIL (CTAS swap) | ANALYTICS.CURATED.T_SALES_DETAIL |
| 11:45 | refresh_inventory_history | Rebuild T_INVENTORY_VALUE_HISTORY | ANALYTICS.CURATED.T_INVENTORY_VALUE_HISTORY |
| 11:50 | refresh_fba_pipeline | Rebuild T_FBA_INBOUND_PIPELINE | ANALYTICS.CURATED.T_FBA_INBOUND_PIPELINE |
| 12:00 | sync_to_bigquery | Sync all 19 tables → BigQuery | shopville dataset (GCP) |
| 13:00 | fetch_lego_cpsc_docs | LEGO compliance portal → Acumatica | ACUMATICA.CURATED.T_CPC_DOCUMENTS |
GCP project: shopville-reporting · Dataset: shopville · Service account: looker-bq-sync
| BigQuery table | Snowflake source | Rows | Looker Studio page |
|---|---|---|---|
| sales_detail | T_SALES_DETAIL | ~904K | Sales detail, channel analysis, SKU velocity |
| product_flow_by_category | T_PRODUCT_FLOW_BY_CATEGORY | ~3.6K | Product Flow – Category page |
| product_flow_by_vendor | T_PRODUCT_FLOW_BY_VENDOR | ~7.3K | Product Flow – Vendor page |
| stock_detail | V_STOCK_DETAIL | ~39K | Inventory Report, CPC compliance report |
| po_lines_enriched | V_PO_LINES_ENRICHED | ~43K | PO Lines page (with clickable Acumatica links) |
| stock_health | V_STOCK_HEALTH | ~39K | Supplementary inventory (subset of stock_detail) |
| inventory_plan | V_INVENTORY_PLAN | ~91K | Inventory by location (AWD, Trophy rows) |
| inventory_value_history | T_INVENTORY_VALUE_HISTORY | ~92K | Historical inventory value chart |
| fba_inbound_pipeline | T_FBA_INBOUND_PIPELINE | ~123K | FBA Pipeline dashboard (urgency, velocity) |
| fba_pipeline_stages | V_FBA_PIPELINE_STAGES | ~10K | FBA funnel / stage bar charts |
| order_detail | V_ORDER_DETAIL | ~185K | All Orders report (rolling 30d, all 14 channels + full fee breakdown) |
Margin %: always use calculated field PROFIT_CAD / REVENUE_CAD — never SUM the stored MARGIN_PCT column (it would sum percentages, not recompute). Date filter: set METRIC_MONTH as Date Range dimension on Product Flow pages. CPC links: calculated field HYPERLINK(cpc_doc_url, cpc_doc_filename) on stock_detail.
All reporting data lives in two places: Snowflake (source of truth, full history) and BigQuery (nightly snapshot, used by Looker Studio). For most analytics work, use BigQuery — it's faster and cheaper. Use Snowflake when you need real-time data, full history beyond the BigQuery snapshot, or access to staging/raw tables.
Looker Studio / dashboards → BigQuery. Ad-hoc SQL, Python analysis, fresh data → Snowflake. Raw source system data → Snowflake source DBs (ACUMATICA, AMAZON, etc.).
Account: ieihetm-cj97882.snowflakecomputing.com · Warehouse: SHOPVILLE_WH · Role: ACCOUNTADMIN
Install: pip install snowflake-connector-python pandas
Install: docs.snowflake.com/snowsql
Project: shopville-reporting · Dataset: shopville · Updated nightly at 12:00 UTC
Install: pip install google-cloud-bigquery pandas pyarrow
Credentials: ask Chris for the service account JSON or use your GCP account.
Install: part of Google Cloud SDK
All dashboards use BigQuery as the data source. To add a new report:
Copy-paste starting points for the most common analytics asks. All queries work in both Snowflake (fully-qualified names) and BigQuery (use backtick names shown in comments). Replace date ranges as needed.
Best table: sales_detail in BigQuery or ANALYTICS.CURATED.T_SALES_DETAIL in Snowflake.
Best table: product_flow_by_category — pre-aggregated, fast. Use METRIC_MONTH for date filtering.
Best table: inventory_plan (one row per SKU × location). Use total_value, not available_value.
Best table: order_detail — one row per order with all fees. Covers all 14 channels.
Best table: fba_inbound_pipeline — pre-calculated urgency, velocity, and days of inventory.
Joins sales_detail with stock_detail for a complete view: revenue, profit, margin, and current inventory in one query.
| I want to know… | Best BigQuery table | Key columns |
|---|---|---|
| What's in stock right now, by location | stock_detail | sku, qty_avail_physical, qty_avail_fba_us, total_value_cad |
| What sold and what profit did we make | sales_detail | base_sku, channel, order_date, line_revenue_cad, cogs_cad, net_profit_cad |
| P&L by category or vendor over time | product_flow_by_category / product_flow_by_vendor | metric_month, revenue_cad, profit_cad, margin_pct |
| All orders with complete fee breakdown | order_detail | market, order_date, revenue_cad, total_marketplace_fees_cad, net_profit_cad |
| FBA stock levels and restock urgency | fba_inbound_pipeline | base_sku, marketplace_country, stock_urgency, days_of_inventory_fba |
| Inventory value trend over time | inventory_value_history | activity_date, category, inventory_value_cad |
| Monthly units bought and sold per SKU | units_flow | month_date, base_sku, units_sold, units_purchased, revenue_cad |
| All-time SKU profitability ranking | product_summary | sku, all_time_sales, all_time_profit, sales_365d |
| Purchase orders and PO status | po_lines_enriched | order_nbr, vendor_name, category, status, extended_cost_cad |
| Inventory detail by warehouse | stock_detail_by_warehouse | warehouse_code, sku, qty_on_hand, total_value |
Only needed if you're building a new loader or debugging ETL. All credentials are in /opt/shopville/.env on the server.
| Source | Base URL | Auth Method | Env Vars |
|---|---|---|---|
| Acumatica OData | shopville.acumatica.com/odata/shopville/ | HTTP Basic (username/password) | ACUMATICA_ODATA_USERNAME, ACUMATICA_ODATA_PASSWORD |
| Acumatica REST | shopville.acumatica.com/entity/Default/25.200.001/ | Cookie session — POST /entity/auth/login with JSON body | ACUMATICA_REST_USERNAME, ACUMATICA_REST_PASSWORD |
| SkuVault | api.skuvault.com | POST email+password → TenantToken + UserToken headers | SKUVAULT_EMAIL, SKUVAULT_PASSWORD |
| Amazon SP-API (US/CA/MX) | sellingpartnerapi-na.amazon.com | LWA OAuth: POST client_id+secret+refresh_token → access_token | AMAZON_COM_CLIENT_ID/SECRET/REFRESH_TOKEN, AMAZON_CA_REFRESH_TOKEN |
| Amazon SP-API (UK) | sellingpartnerapi-eu.amazon.com | Same LWA flow, shares US client ID/secret | AMAZON_UK_REFRESH_TOKEN |
| Walmart | marketplace.walmartapis.com | Basic base64(clientId:secret) + WM_SEC.ACCESS_TOKEN bearer | WALMART_CLIENT_ID, WALMART_CLIENT_SECRET |
| ShipStation (US) | ssapi.shipstation.com | HTTP Basic (API key as user, "X" as password) | SHIPSTATION_API_KEY |
| ShipStation (CA) | ssapi.shipstation.com | HTTP Basic (separate CA account key) | SHIPSTATION_CA_API_KEY |
| eBay Finances | apiz.ebay.com/sell/finances/v1 | OAuth 2.0 bearer token (per-account) | EBAY_US_*, EBAY_CA_* |
| Bank of Canada FX | www.bankofcanada.ca/valet/observations/ | None (public API) | — |
| LEGO Compliance | ecert.legal.services.lego.com/api/v1/ | x-api-key header (rotates — check Chrome DevTools if 401) | LEGO_API_KEY (hardcoded in fetch_lego_cpsc_docs.py) |