Overview
Physical data models (PDMs) are used to design the structure of a persistent data store. Typically a PDM is created for a
single data store, although sometimes a PDM will cover several related data stores (this is particularly true with the data
storage mechanism is file based). The assumption in this guideline is that you are modeling the schema of a single
relational database.
The Data Model in OpenUP
The PDM is part of the Work Product: Design. It’s described as different views or perspectives of a portion of the design.
Data Model Types
Traditionally, there are three types of data models:
-
Conceptual. A conceptual model, also referred to as a domain model, depicts the critical business entities and
the relationships between them.
-
Logical. A logical data model (LDM) adds detail, in particular data attributes and more entities. LDMs will
optionally indicate candidate keys (one or more attributes of an entity which uniquely identify it) of an
entity. LDMs describe how the design of the system handles the data that will be actually maintained in the
PDM.
-
Physical. A PDM depicts the table structure (in the case of a relational database design), the
relationships between the tables, and the primary and foreign keys implemented by the tables. PDMs
potentially indicate views, stored procedures, and other database elements.
For systems built using object and/or component-based technology, the LDM is usually not created in favor of a class
model.
Physical Data Modeling
The PDM consists of the detailed database table designs and their relationships. The tables in the PDM have
well-defined columns, as well as keys and indexes as needed. The tables might also have triggers defined as necessary
to support the database functionality and referential integrity of the system. In addition to the tables, stored
procedures have been created, documented, and associated with the database in which the stored procedure will reside.
The diagram below shows an example of some of the elements of a PDM. A UML-based notation is used, although other
notations such as “crow's feet” or Information Engineering (IE) are also common. This example model is a part of the
PDM of a fictional order entry system. It depicts three tables (Order, OrderItem, and Item), three stored procedures
(GetOrders, GetTotalBusiness, and TestDatabase), and a trigger on Order named deleteOrderItems. The figure also depicts
the columns of each table, the primary key for each table, and any foreign keys to other tables.
Example Physical Data Model
An existing database can be reverse-engineered to populate the PDM if the team has access to a tool that can transform
a database into a model.
How to Model Database Schemas
Perform the following in an iterative manner:
-
Identify tables. A table is similar conceptually to object-orientation’s concept of a class – a table contains a
collection of rows of data whereas a class defines a collection of objects. A table could contain rows representing
people, places, things, events, or concepts. Examples of tables include Customer, Order, and OrderItem.
-
Identify columns. Each table has one or more columns. A column stores a single data attribute for each row. For
example, the Customer table may have columns such as First Name and Surname. A column has a single data type, such
as INT, DATETIME, or VARCHAR.
-
Follow accepted modeling conventions. Your organization should have standards and guidelines applicable to data
modeling, in particular naming conventions, that you should follow.
-
Identify relationships between tables. In the real world entities have relationships with other entities. For
example, customers PLACE orders, customers LIVE AT addresses, and line items ARE PART OF orders. These
relationships will exist between the rows of data stored in the corresponding tables.
-
Assign keys. A key is one or more columns that uniquely identify a row in a table. A primary key is the preferred
key for a table. For example, the Customer table may have two potential keys, CustomerID and SocialSecurityNumber.
You could choose to use CustomerID as the primary key, thereby making SocialSecurityNumber a secondary key. Foreign
keys are used to maintain relationships between rows.
-
Normalize to reduce data redundancy. Data normalization is a process in which columns within a PDM are organized to
increase the cohesion of tables. In other words, the goal of data normalization is to reduce and even eliminate
data redundancy, an important consideration for application developers because it is incredibly difficult to store
objects in a relational database that maintains the same information in several places.
-
De-normalize to improve performance. Normalized data schemas, when put into production, often suffer from
performance problems. An important part of data modeling is to de-normalize portions of the data schema, to
increase data redundancy by storing the same information in several places or manners, to improve database access
times.
Data Modeling Throughout the Lifecycle
Inception Phase
During the Inception phase the goal is to identify high-level requirements for the system so that the scope may be
identified and project funding obtained. Little, if any, data modeling is performed at this point although some
conceptual modeling may occur. Detailed data models are usually not needed at this point.
Elaboration Phase
The goal of the Elaboration phase is to eliminate technical risk and to produce a stable (baselined) architecture for
the system. One of several architectural issues that is likely to arise is data architecture. To support this effort,
you will need to model the major database structures (tables, indexes, and primary and foreign key columns) and then
create the database schema from the model (ideally it would be generated from a modeling tool).
Additionally, representative data volumes may be loaded into the database to support performance testing. Based on the
results of performance testing, the PDM might need to be adjusted with optimization techniques, including but not
limited to de-normalizing, optimizing physical storage attributes, or distribution and indexing.
Construction Phase
During the Construction phase the goal is to build a working system that is ready to be released. During each
iteration the design, implementation, and tests are fleshed out to implement that iteration's requirements.
In other words development artifacts, including your data-oriented artifacts, evolve over time. To support data model
changes you may discover the need to refactor your existing database schema.
Transition Phase
The PDM is maintained during the Transition phase in response to approved change requests. You should keep the PDM
synchronized with the database as the application goes through final acceptance test and is deployed into production.
Round-trip Engineering Considerations
If a development team is using modern visual modeling tools that have the ability to convert classes to tables (and
vice versa) or has the ability to reverse and forward engineer databases, then the team needs to establish guidelines
for managing the transformation and engineering processes. The development team must define the points in the
development of the application (build-and-release cycle) at which it will be appropriate to perform the class-to-table
transformations and to forward-engineer the database. Once the initial database is created, the development team must
define guidelines for the team to manage the synchronization of the PDM and database as the design and code of the
system evolve throughout the project.
|