## The Infinite Czerepak Regression

The relational model is said to be based on Cartesian mathematics.  Descartes greatest contribution was a Cartesian geometry which was the creation of a coordinate system for Euclidean geometry and geometry in general.  Whatever database you use, it has a coordinate system which gives it its structure.  Even a single bit is a structure all by itself.  Cartesian geometry took Euclidean geometry out of synthesis based on theorems and logic and moved it into coordinates based on analysis and algebra.  We are dealing with linear analysis and linear algebra when we are talking about the relational model and it forms the foundation for calculus and relational calculus which lead to  E. F. Codd’s inventing the relational model for database management, the theoretical basis for relational databases.

The truth about a relational database is conceptually it is only one table.  Only one.  Any relational database could be converted into a single spreadsheet.  So, why all the tables?

First, every database can be separated into the logical structure and the physical content.  The logical structure is called the schema and the physical content is called the data.  Logical and physical separation optimizes the administration, management and employment of the database.  As soon as you separate the logical and physical tables you eliminate an enormous amount of duplication in the system as well as all the effort required to maintain the correct values among all the duplicates.  This optimization is the first normalization of the database.

Now, Codd came up with what are known as The Twelve Rules which I have copied from Wikipedia:

### The Codd Rules

Rule 0: The system must qualify as relational, as a database, and as a management system.

For a system to qualify as a relational database management system (RDBMS), that system must use its relational facilities (exclusively) to manage the database.

Rule 1: The information rule:

All information in the database is to be represented in one and only one way, namely by values in column positions within rows of tables.

Rule 2: The guaranteed access rule:

All data must be accessible. This rule is essentially a restatement of the fundamental requirement for primary keys. It says that every individual scalar value in the database must be logically addressable by specifying the name of the containing table, the name of the containing column and the primary key value of the containing row.

Rule 3: Systematic treatment of null values:

The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of “missing information and inapplicable information” that is systematic, distinct from all regular values (for example, “distinct from zero or any other number”, in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.

Rule 4: Active online catalog based on the relational model:

The system must support an online, inline, relational catalog that is accessible to authorized users by means of their regular query language. That is, users must be able to access the database’s structure (catalog) using the same query language that they use to access the database’s data.

Rule 5: The comprehensive data sublanguage rule:

The system must support at least one relational language that

1. Has a linear syntax
2. Can be used both interactively and within application programs,
3. Supports data definition operations (including view definitions), data manipulation operations (update as well as retrieval), security and integrity constraints, and transaction management operations (begin, commit, and rollback).

Rule 6: The view updating rule:

All views that are theoretically updatable must be updatable by the system.

Rule 7: High-level insert, update, and delete:

The system must support set-at-a-time insert, update, and delete operators. This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.

Rule 8: Physical data independence:

Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to an application based on the structure.

Rule 9: Logical data independence:

Changes to the logical level (tables, columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence.

Rule 10: Integrity independence:

Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications.

Rule 11: Distribution independence:

The distribution of portions of the database to various locations should be invisible to users of the database. Existing applications should continue to operate successfully :

1. when a distributed version of the DBMS is first introduced; and
2. when existing distributed data are redistributed around the system.

Rule 12: The nonsubversion rule:

If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to subvert the system, for example, bypassing a relational security or integrity constraint.

Now there is another way to look at Codd’s Twelve Rules that is not often discussed directly and that is the OSI Model. The OSI Model is used to describe network layers and it is perfectly logical to regard a relational database as a dynamic lattice network composed of conceptual layers.

The Open System Interconnection Reference Model (OSI Reference Model or OSI Model) is an abstract description for layered communications and computer network protocol design. It was developed as part of the Open Systems Interconnection (OSI) initiative. In its most basic form, it divides network architecture into seven layers which, from top to bottom, are the Application, Presentation, Session, Transport, Network, Data-Link, and Physical Layers. It is therefore often referred to as the OSI Seven Layer Model.

 Application (Layer 7) This layer supports application and end-user processes. Communication partners are identified, quality of service is identified, user authentication and privacy are considered, and any constraints on data syntax are identified. Everything at this layer is application-specific. This layer provides application services for file transfers, e-mail, and other network software services. Telnet and FTP are applications that exist entirely in the application level. Tiered application architectures are part of this layer. Presentation (Layer 6) This layer provides independence from differences in data representation (e.g., encryption) by translating from application to network format, and vice versa. The presentation layer works to transform data into the form that the application layer can accept. This layer formats and encrypts data to be sent across a network, providing freedom from compatibility problems. It is sometimes called the syntax layer. Session (Layer 5) This layer establishes, manages and terminates connections between applications. The session layer sets up, coordinates, and terminates conversations, exchanges, and dialogues between the applications at each end. It deals with session and connection coordination. Transport (Layer 4) This layer provides transparent transfer of data between end systems, or hosts, and is responsible for end-to-end error recovery and flow control. It ensures complete data transfer. Network (Layer 3) This layer provides switching and routing technologies, creating logical paths, known as virtual circuits, for transmitting data from node to node. Routing and forwarding are functions of this layer, as well as addressing, internetworking, error handling, congestion control and packet sequencing. Data Link (Layer 2) At this layer, data packets are encoded and decoded into bits. It furnishes transmission protocol knowledge and management and handles errors in the physical layer, flow control and frame synchronization. The data link layer is divided into two sub layers: The Media Access Control (MAC) layer and the Logical Link Control (LLC) layer. The MAC sub layer controls how a computer on the network gains access to the data and permission to transmit it. The LLC layer controls frame synchronization, flow control and error checking. Physical (Layer 1) This layer conveys the bit stream – electrical impulse, light or radio signal — through the network at the electrical and mechanical level. It provides the hardware means of sending and receiving data on a carrier, including defining cables, cards and physical aspects. Fast Ethernet, RS232, and ATM are protocols with physical layer components.

A layer is a collection of conceptually similar functions that provide services to the layer above it and receives service from the layer below it. On each layer an instance provides services to the instances at the layer above and requests service from the layer below. For example, a layer that provides error-free communications across a network provides the path needed by applications above it, while it calls the next lower layer to send and receive packets that make up the contents of the path. Conceptually two instances at one layer are connected by a horizontal protocol connection on that layer.

If we look at the Relational Model and Codd’s rules in the context of the OSI Model we can think of the rules in the following way as a hybrid I call the Czerepak Rules:

### The Czerepak Rules

• Why Layer: Protect Application Input definition from Catalog View definition.
• See Layer: Protect Catalog View definition from Catalog User definition
• Use Layer: Protect Catalog User definition from Catalog Language definition.
• Act Layer: Protect Catalog Language definition from Catalog Column definition
• Col Layer: Protect Catalog Column definition from Catalog Row definition
• Row Layer: Protect Catalog Row Definition from Catalog Constraint definition
• See Layer: Protect Data View definition from Data User definition
• Use Layer: Protect Data User definition from Data Language  definition
• Act Layer: Protect Data Language definition from Data Column definition
• Col Layer: Protect Data Column definition from Data Row definition
• Row Layer: Protect Data Row Definition from Data Constraint definition
• Yes Layer: Protect Data Constraint definition from Data Input definition
• Why Layer: Application Layer: Protect Data Input definition from Access Method View Definition

If you look carefully at the list you will see that it extends beyond the top and beyond the bottom in an infinite regression.

“Turtles all the way down,” or “The Infinite Turtle Theory,” refers to the infinite regression problem in cosmology posed by the Unmoved mover paradox. The phrase was popularized by Stephen Hawking in 1988. The “turtle” metaphor in the anecdote represents a popular notion of a “primitive cosmological myth“, viz. the flat earth supported on the back of a World Turtle.

It is interesting to note that “czerepak” is a Yiddish word which means “turtle”.

A comparable metaphor describing the circular cause and consequence for the same problem is the “chicken and egg problem“. Another metaphor addressing the problem of infinite regression, albeit not in a cosmological context, is Quis custodiet ipsos custodes? The same problem in epistemology is known as the Münchhausen Trilemma.