Skip to content

Requirements Clarification Questions

I detected some contradictions and ambiguities in your responses that need clarification before proceeding.

Instructions:

  • Answer each question by filling in the letter choice after the Answer: tag
  • Let me know when you're done

Contradiction 1: Database Technology

Issue: Your original request mentioned "postgresql database" but Q9 reveals you're using MariaDB on RDS.

This is significant because:

  • dbt has different adapters and feature support for PostgreSQL vs MariaDB
  • MariaDB doesn't have JSONB (uses JSON with different performance characteristics)
  • Query optimization and indexing strategies differ
  • Aurora PostgreSQL vs Aurora MySQL are different products

Clarification Question 1

What database technology should this project use?

A) MariaDB on RDS (current infrastructure, stay with what we have) B) Aurora MySQL (upgrade from MariaDB, better performance/scalability) C) PostgreSQL on RDS (switch to PostgreSQL as originally mentioned) D) Aurora PostgreSQL (switch to PostgreSQL with Aurora benefits) E) Other (please describe after Answer: tag below)


Contradiction 2: Timeline vs Complexity

Issue: Q18 indicates "Urgent: 2-4 weeks" but several factors suggest this is very aggressive:

  • Q10: Transformation complexity unknown (needs analysis)
  • Q12: Comprehensive monitoring required (significant effort)
  • Q19: Team familiar with tools but "never this way" (learning curve)
  • Q13: Data parity validation testing (time-consuming)

Clarification Question 2

Given the complexity, what is the realistic timeline expectation?

A) 2-4 weeks for MVP (basic pipeline, minimal monitoring, manual validation) B) 4-6 weeks for production-ready (full pipeline, standard monitoring, automated testing) C) 6-8 weeks for comprehensive solution (full pipeline, comprehensive monitoring, complete testing) D) Phased approach: MVP in 2-4 weeks, then iterate to add monitoring/testing E) Other (please describe after Answer: tag below)


Contradiction 3: Performance Requirements

Issue: Q4 says "hourly updates acceptable" but your original request emphasized moving from "nightly full load" to "always delta load" (near real-time).

Clarification Question 3

What is the actual performance requirement for data freshness?

A) Near real-time (< 5 minutes) - process files as they arrive B) Frequent batch (every 15-30 minutes) - scheduled processing C) Hourly batch - acceptable delay for downstream systems D) Flexible - start with hourly, optimize to near real-time later E) Other (please describe after Answer: tag below)


Ambiguity 1: Transformation Complexity Analysis

Issue: Q10 indicates transformation complexity is unknown and needs analysis.

Clarification Question 4

How should we handle the unknown transformation complexity?

A) Provide Blower schema now, analyze transformations before design phase B) Start with simple 1:1 mapping, iterate based on downstream system feedback C) Reverse engineer current Blower PHP transformations to understand logic D) Work with downstream teams to define required transformations E) Other (please describe after Answer: tag below)


Ambiguity 2: Monitoring Scope vs Timeline

Issue: Q12 requests comprehensive monitoring but Q18 has urgent 2-4 week timeline.

Clarification Question 5

What monitoring capabilities are required for initial launch vs future iterations?

A) MVP: Basic Airflow task monitoring only B) MVP: Airflow monitoring + data quality checks (record counts, schema validation) C) MVP: Full comprehensive monitoring (metrics, logs, traces, SLAs) D) Phased: Start with basic, add comprehensive monitoring in phase 2 E) Other (please describe after Answer: tag below)


Ambiguity 3: Product vs Item Data Model

Issue: Q11 mentions "our current process deals with items our new files have products and items are children of products"

This is a significant data model change that affects:

  • Database schema design
  • Downstream system compatibility
  • Transformation logic complexity

Clarification Question 6

How should the Product → Item hierarchy be handled for downstream compatibility?

A) Flatten Products to Items (maintain current item-centric model) B) Introduce Product table, Items reference Products (new hierarchy) C) Dual model: Product table + flattened Item view for backward compatibility D) Need to analyze downstream systems to determine approach E) Other (please describe after Answer: tag below)


Ambiguity 4: Blower Schema Documentation

Issue: Q1 says you can provide schema documentation. This is critical for design.

Clarification Question 7

When can you provide the Blower database schema documentation?

A) Available now - can share immediately B) Available within 1-2 days C) Need 1 week to document/extract D) Will provide incrementally as we design each component E) Other (please describe after Answer: tag below)


Ambiguity 5: Multiple Downstream Systems

Issue: Q2 mentions "E-commerce platforms (Magento, Shopware) - they expect specific table structures" plus "Algolia indexes and possibly some other systems"

Clarification Question 8

Should the pipeline output support all downstream systems or focus on specific ones?

A) Single unified schema that all systems consume B) Multiple output schemas (one per downstream system type) C) Focus on e-commerce platforms first, add others later D) Need to inventory all downstream systems before deciding E) Other (please describe after Answer: tag below)