Designing complete and correct data elements of the solution…
In the Analysis Phase, we spend time working with our clients on developing a data model. The data model describes the various data elements of the solution and their relationships to each other. In the Design Phase these data elements are grouped together into logical files, tables and databases.
Experienced Data Architects & Administrators
Large projects typically have a specialist called a Data Architects and Administrators who work on designing the datastores. Darby provides experienced Data Architects and Administrators with in-depth and specialized knowledge in designing tables and databases so that they strictly meet the business requirements, but are also as efficient as possible. Our architects are typically involved in the project during the Analysis Phase in gathering data requirements and putting together the data models as well as making sure that the data relationships are complete and correct.
Our Data Modeling Methodology
Step 1: Group Together Related Data Elements
After the individual data elements are defined (usually in the data model), their relationship to each other must be validated. The starting point for this effort should be in the data model. If the data model is thorough and sophisticated, many of these relationships will already be identified. After reviewing the data models, we look at the requirements to see the type of information that each transaction or function needs. This will help us determine which set of data elements should be grouped together.
Step 2: Identify Keys Based on How the Data Relates
After we have grouped most of the data elements together, we should start to see how the different tables relate to each other. In some cases there will be no obvious relationship. In other cases we will see the connections. If we completed a data model as a part of the Analysis Phase, this relationship between data elements may already have been defined.
Step 3: Reduce Data Redundancy
Part of our “normalization” process is to look at the groups of data and eliminate as many data redundancies as possible and practical.
Step 4: Identify Secondary Keys Based on Data Access Patterns
After the logical design is initially laid out, our Data Analysts start to look at how the business processes work. There may be times when certain business functions or transactions need data elements from different tables, even though there is not necessarily an obvious logical relationship between the tables. If there is a relationship based on a key value, we might find that this key is not the right way to relate the tables at that point in the process. When this happens, the Data Analyst can designate secondary keys. The keys allow tables to interrelate in ways other than that based on the primary key. This is an example of introducing compromises into the logical design. Secondary keys take up more space and introduce complexity. However, they will result in easier programming of the components and usually faster performance when the process executes.
Step 5: Create Views
The data analyst can define different views of the data so that people only have access to the specific data elements they need. There are at least three ways we view the data elements.
- Physical. At the lowest level is the actual physical view that corresponds to the exact way that the data is structured into tables and files. This is also referred to as the internal view. The physical view is not known during the design phase, since the actual physical structures may be somewhat different than the final logical design.
- Logical. This is also known as the conceptual design. This is the final logical structure that is delivered from the Design Phase. If people want to see how the data is structured and modeled, they would see the logical view. It may appear to be the same as the physical view, but typically it is not.
- User. The user view is the way the data is represented to external parties. This is also called the external view. These external parties may be users of the solution, but they can be the programmers as well. When people need to access the data, they describe the elements they need to see and how they will use the information. They may be given access to the logical view of the data. However, they may also get access to the data through specialized user views. These views may contain only the data needed by that person, without showing the rest of the elements that may also be in the tables. If tables need to be related (joined) to create a complete picture, the view may have these elements joined already.
Step 6: Define Interim and Calculated Elements
In addition to the actual data elements defined in the data model, there may be additional elements defined to make the solution more efficient, even though these may compromise the purity of the data relationships. For instance, we may store calculated values that would take a while to calculate in real time, or values that are needed very often, in a table. We may also add fields that are needed to make it easier to connect different tables. There may be other new elements that have a value depending on the data of a second field. In some instances, these additional values make it possible to build relationships that would not otherwise exist. In other cases, the values stored in the table will save processing time and help the performance of the solution.
Step 7: Create a Data Dictionary
The data model created in the Analysis Phase is the source for a data dictionary that is defined for the datastores. The data dictionary contains metadata, or “data about the data”. In other words, the data dictionary defines and describes each data element, each interim and calculated element, each row, each table, each key, etc. Depending on the technology we are using, the data dictionary can be started in the Analysis Phase and further updated in the Design Phase. Information in the data dictionary includes:
- Element name
- Element description / definition
- Attributes such as the length and type of data.
- Synonyms or other names by which this entity is known. This is helpful if other solutions already exist that reference this element by other names.
- How the entity is derived. This is used for interim values, calculated elements or entities made from a combination of other entities.
- Edits that are made before the data values are stored. For instance, some elements cannot be blank, or must be greater than zero, or must be numeric.