Thursday, March 12, 2015

Data repository and modeling technique selection


Series:







RDBMS? NoSQL? Datawareshouse?

ER? Dimensional? Star? Snowflake? Constellation?

Answer: Mixed of data repositories with mixed modelling.

·        Understand traits of data integrity.

o   Do you need ATOMicity? (atomicity, consistency, isolation, durability)

o   Do you need BASE properties? (Basic Availability, Soft-state, Eventual consistency)

o   Have you alluded to CAP theorem?

o   Are you using different isolation levels?

o   In memory repository model? (E.g. memcached)

·        Choose the traits from CAP theorem.

o   Consistency: every read would get you the most recent write

o   Availability: every node (if not failed) always executes queries

o   Partition-tolerance: even if the connections between nodes are down, the other two (A & C) promises, are kept.

·        Understand properties of data repository

o   E.g. for DW: subject-oriented, integrated, time-variant, and nonvolatile (from one source)

·        OLTP vs OLAP?

·        Type of NoSQL (or shchemaless or rather implicit schema)

o   Key-Value

o   Document

o   Column-family

o   Graph (this supports ACID)

Suggested data repository selection based on CAP model:

 

Decision makers




Notes:

·        Modelling follows after the DB selection, conspicuously.

·        RDBMS entails ER modeling. Determine the level of normalization to adhere to. Usually 3rd normalization form or Boyce-codd (colloquially called 3.5 NF) is the highest level recommended. Although you can go up to 6th NF.

·        Dimensional modeling is a better approach for Data warehouse compared to standard Data Model.

·        You will end up using one or more RDBMS with one or more NoSQL along with DW.

No comments:

Post a Comment