Guideline: Physical Data Modeling
A physical data model (PDM) captures the design of a persistent data store such as a relational database or data file. Data modeling is the act of creating such a model.
Relationships
Related Elements
Main Description

Overview

Physical data models (PDMs) are used to design the structure of a persistent data store. Typically a PDM is created for a single data store, although sometimes a PDM will cover several related data stores (this is particularly true with the data storage mechanism is file based). The assumption in this guideline is that you are modeling the schema of a single relational database.

The Data Model in OpenUP

The PDM is part of the Work Product: Design. It’s described as different views or perspectives of a portion of the design.

Data Model Types

Traditionally, there are three types of data models:

  1. Conceptual. A conceptual model, also referred to as a domain model, depicts the critical business entities and the relationships between them.
  2. Logical. A logical data model (LDM) adds detail, in particular data attributes and more entities. LDMs will optionally indicate candidate keys (one or more attributes of an entity which uniquely identify it) of an entity. LDMs describe how the design of the system handles the data that will be actually maintained in the PDM.
  3. Physical. A PDM depicts the table structure (in the case of a relational database design), the relationships between the tables, and the primary and foreign keys implemented by the tables. PDMs potentially indicate views, stored procedures, and other database elements.

For systems built using object and/or component-based technology, the LDM is usually not created in favor of a class model.

Physical Data Modeling

The PDM consists of the detailed database table designs and their relationships. The tables in the PDM have well-defined columns, as well as keys and indexes as needed. The tables might also have triggers defined as necessary to support the database functionality and referential integrity of the system. In addition to the tables, stored procedures have been created, documented, and associated with the database in which the stored procedure will reside.

The diagram below shows an example of some of the elements of a PDM. A UML-based notation is used, although other notations such as “crow's feet” or Information Engineering (IE) are also common. This example model is a part of the PDM of a fictional order entry system. It depicts three tables (Order, OrderItem, and Item), three stored procedures (GetOrders, GetTotalBusiness, and TestDatabase), and a trigger on Order named deleteOrderItems. The figure also depicts the columns of each table, the primary key for each table, and any foreign keys to other tables.

Example Physical Data Model

 

An existing database can be reverse-engineered to populate the PDM if the team has access to a tool that can transform a database into a model.

How to Model Database Schemas

Perform the following in an iterative manner:

  1. Identify tables. A table is similar conceptually to object-orientation’s concept of a class – a table contains a collection of rows of data whereas a class defines a collection of objects. A table could contain rows representing people, places, things, events, or concepts. Examples of tables include Customer, Order, and OrderItem.
  2. Identify columns. Each table has one or more columns. A column stores a single data attribute for each row. For example, the Customer table may have columns such as First Name and Surname. A column has a single data type, such as INT, DATETIME, or VARCHAR.
  3. Follow accepted modeling conventions. Your organization should have standards and guidelines applicable to data modeling, in particular naming conventions, that you should follow.
  4. Identify relationships between tables. In the real world entities have relationships with other entities. For example, customers PLACE orders, customers LIVE AT addresses, and line items ARE PART OF orders. These relationships will exist between the rows of data stored in the corresponding tables.
  5. Assign keys. A key is one or more columns that uniquely identify a row in a table. A primary key is the preferred key for a table. For example, the Customer table may have two potential keys, CustomerID and SocialSecurityNumber. You could choose to use CustomerID as the primary key, thereby making SocialSecurityNumber a secondary key. Foreign keys are used to maintain relationships between rows.
  6. Normalize to reduce data redundancy. Data normalization is a process in which columns within a PDM are organized to increase the cohesion of tables. In other words, the goal of data normalization is to reduce and even eliminate data redundancy, an important consideration for application developers because it is incredibly difficult to store objects in a relational database that maintains the same information in several places.
  7. De-normalize to improve performance. Normalized data schemas, when put into production, often suffer from performance problems. An important part of data modeling is to de-normalize portions of the data schema, to increase data redundancy by storing the same information in several places or manners, to improve database access times.

Data Modeling Throughout the Lifecycle

Inception Phase

During the Inception phase the goal is to identify high-level requirements for the system so that the scope may be identified and project funding obtained. Little, if any, data modeling is performed at this point although some conceptual modeling may occur. Detailed data models are usually not needed at this point.

Elaboration Phase

The goal of the Elaboration phase is to eliminate technical risk and to produce a stable (baselined) architecture for the system. One of several architectural issues that is likely to arise is data architecture. To support this effort, you will need to model the major database structures (tables, indexes, and primary and foreign key columns) and then create the database schema from the model (ideally it would be generated from a modeling tool).

Additionally, representative data volumes may be loaded into the database to support performance testing. Based on the results of performance testing, the PDM might need to be adjusted with optimization techniques, including but not limited to de-normalizing, optimizing physical storage attributes, or distribution and indexing.

Construction Phase

During the Construction phase the goal is to build a working system that is ready to be released. During each iteration the design, implementation, and tests are fleshed out to implement that iteration's requirements. In other words development artifacts, including your data-oriented artifacts, evolve over time. To support data model changes you may discover the need to refactor your existing database schema.

Transition Phase

The PDM is maintained during the Transition phase in response to approved change requests. You should keep the PDM synchronized with the database as the application goes through final acceptance test and is deployed into production.

Round-trip Engineering Considerations

If a development team is using modern visual modeling tools that have the ability to convert classes to tables (and vice versa) or has the ability to reverse and forward engineer databases, then the team needs to establish guidelines for managing the transformation and engineering processes. The development team must define the points in the development of the application (build-and-release cycle) at which it will be appropriate to perform the class-to-table transformations and to forward-engineer the database. Once the initial database is created, the development team must define guidelines for the team to manage the synchronization of the PDM and database as the design and code of the system evolve throughout the project.