Entity–relationship model

An entity–relationship diagram using Chen's notation

An entity–relationship model (ER model) describes inter-related things of interest in a specific domain of knowledge. An ER model is composed of entity types (which classify the things of interest) and specifies relationships that can exist between instances of those entity types.

In software engineering an ER model is commonly formed to represent things that a business needs to remember in order to perform business processes. Consequently, the ER model becomes an abstract data model that defines a data or information structure that can be implemented in a database, typically a relational database.

Entity–relationship modeling was developed for database design by Peter Chen and published in a 1976 paper.[1] However, variants of the idea existed previously,[2] some ER modelers show super and subtype entities connected by generalization-specialization relationships,[3] and an ER model can be used also in the specification of domain-specific ontology.

Introduction

An entity–relationship model is usually the result of systematic analysis to define and describe what is important to processes in an area of a business. It does not define the business processes; it only presents a business data schema in graphical form. It is usually drawn in a graphical form as boxes (entities) that are connected by lines (relationships) which express the associations and dependencies between entities. An ER model can also be expressed in a verbal form, for example: one building may be divided into zero or more apartments, but one apartment can only be located in one building.

Entities may be characterized not only by relationships, but also by additional properties (attributes), which include identifiers called "primary keys". Diagrams created to represent attributes as well as entities and relationships may be called entity–attribute-relationship diagrams, rather than entity-relationship models.

An ER model is typically implemented as a database. In a simple relational database implementation, each row of a table represents one instance of an entity type, and each field in a table represents an attribute type. In a relational database a relationship between entities is implemented by storing the primary key of one entity as a pointer or "foreign key" in the table of another entity

There is a tradition for ER/data models to be built at two or three levels of abstraction. Note that the conceptual-logical-physical hierarchy below is used in other kinds of specification, and is different from the three schema approach to software engineering.

Conceptual data model
This is the highest level ER model in that it contains the least granular detail but establishes the overall scope of what is to be included within the model set. The conceptual ER model normally defines master reference data entities that are commonly used by the organization. Developing an enterprise-wide conceptual ER model is useful to support documenting the data architecture for an organization.
A conceptual ER model may be used as the foundation for one or more logical data models (see below). The purpose of the conceptual ER model is then to establish structural metadata commonality for the master data entities between the set of logical ER models. The conceptual data model may be used to form commonality relationships between ER models as a basis for data model integration.
Logical data model
A logical ER model does not require a conceptual ER model, especially if the scope of the logical ER model includes only the development of a distinct information system. The logical ER model contains more detail than the conceptual ER model. In addition to master data entities, operational and transactional data entities are now defined. The details of each data entity are developed and the relationships between these data entities are established. The logical ER model is however developed independent of technology into which it can be implemented.
Physical data model
One or more physical ER models may be developed from each logical ER model. The physical ER model is normally developed to be instantiated as a database. Therefore, each physical ER model must contain enough detail to produce a database and each physical ER model is technology dependent since each database management system is somewhat different.
The physical model is normally instantiated in the structural metadata of a database management system as relational database objects such as database tables, database indexes such as unique key indexes, and database constraints such as a foreign key constraint or a commonality constraint. The ER model is also normally used to design modifications to the relational database objects and to maintain the structural metadata of the database.

The first stage of information system design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The data modeling technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain area of interest. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are referred to as "physical design."

Entity–relationship modeling

An entity may be defined as a thing capable of an independent existence that can be uniquely identified. An entity is an abstraction from the complexities of a domain. When we speak of an entity, we normally speak of some aspect of the real world that can be distinguished from other aspects of the real world.[4]

An entity is a thing that exists either physically or logically. An entity may be a physical object such as a house or a car (they exist physically), an event such as a house sale or a car service, or a concept such as a customer transaction or order (they exist logically—as a concept). Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term.

Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem.

A relationship captures how entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proves relationship between a mathematician and a conjecture.

The model's linguistic aspect described above is utilized in the declarative database query language ERROL, which mimics natural language constructs. ERROL's semantics and implementation are based on reshaped relational algebra (RRA), a relational algebra that is adapted to the entity–relationship model and captures its linguistic aspect.

Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute.

Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key.

Entity–relationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets(all entities of the same entity type) and relationship sets(all relationships of the same relationship type). Example: a particular song is an entity. The collection of all songs in a database is an entity set. The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch relationships in a database is a relationship set. In other words, a relationship set corresponds to a relation in mathematics, while a relationship corresponds to a member of the relation.

Certain cardinality constraints on relationship sets may be indicated as well.

Mapping natural language

Chen proposed the following "rules of thumb" for mapping natural language descriptions into ER diagrams: "English, Chinese and ER diagrams" by Peter Chen.

English grammar structure ER structure
Common noun Entity type
Proper noun Entity
Transitive verb Relationship type
Intransitive verb Attribute type
Adjective Attribute for entity
Adverb Attribute for relationship

Physical view show how data is actually stored.

Relationships, roles and cardinalities

In Chen's original paper he gives an example of a relationship and its roles. He describes a relationship "marriage" and its two roles "husband" and "wife".

A person plays the role of husband in a marriage (relationship) and another person plays the role of wife in the (same) marriage. These words are nouns. That is no surprise; naming things requires a noun.

Chen's terminology has also been applied to earlier ideas. The lines, arrows and crow's-feet of some diagrams owes more to the earlier Bachman diagrams than to Chen's relationship diamonds.

Another common extension to Chen's model is to "name" relationships and roles as verbs or phrases.

Role naming

It has also become prevalent to name roles with phrases such as is the owner of and is owned by. Correct nouns in this case are owner and possession. Thus person plays the role of owner and car plays the role of possession rather than person plays the role of, is the owner of, etc.

The use of nouns has direct benefit when generating physical implementations from semantic models. When a person has two relationships with car then it is possible to generate names such as owner_person and driver_person, which are immediately meaningful.[5]

Cardinalities

Modifications to the original specification can be beneficial. Chen described look-across cardinalities. As an aside, the Barker–Ellis notation, used in Oracle Designer, uses same-side for minimum cardinality (analogous to optionality) and role, but look-across for maximum cardinality (the crows foot).

In Merise,[6] Elmasri & Navathe[7] and others[8] there is a preference for same-side for roles and both minimum and maximum cardinalities. Recent researchers (Feinerer,[9] Dullea et al.[10]) have shown that this is more coherent when applied to n-ary relationships of order greater than 2.

In Dullea et al. one reads "A 'look across' notation such as used in the UML does not effectively represent the semantics of participation constraints imposed on relationships where the degree is higher than binary."

In Feinerer it says "Problems arise if we operate under the look-across semantics as used for UML associations. Hartmann[11] investigates this situation and shows how and why different transformations fail." (Although the "reduction" mentioned is spurious as the two diagrams 3.4 and 3.5 are in fact the same) and also "As we will see on the next few pages, the look-across interpretation introduces several difficulties that prevent the extension of simple mechanisms from binary to n-ary associations."

Various methods of representing the same one to many relationship. In each case, the diagram shows the relationship between a person and a place of birth: each person must have been born at one, and only one, location, but each location may have had zero or more people born at it.
Two related entities shown using Crow's Foot notation. In this example, an optional relationship is shown between Artist and Song; the symbols closest to the song entity represents "zero, one, or many", whereas a song has "one and only one" Artist. The former is therefore read as, an Artist (can) perform(s) "zero, one, or many" song(s).

Chen's notation for entity–relationship modeling uses rectangles to represent entity sets, and diamonds to represent relationships appropriate for first-class objects: they can have attributes and relationships of their own. If an entity set participates in a relationship set, they are connected with a line.

Attributes are drawn as ovals and are connected with a line to exactly one entity or relationship set.

Cardinality constraints are expressed as follows:

Attributes are often omitted as they can clutter up a diagram; other diagram techniques often list entity attributes within the rectangles drawn for entity sets.

Related diagramming convention techniques:

Crow's foot notation

Crow's foot notation is used in Barker's Notation, Structured Systems Analysis and Design Method (SSADM) and information engineering. Crow's foot diagrams represent entities as boxes, and relationships as lines between the boxes. Different shapes at the ends of these lines represent the cardinality of the relationship.

Crow's foot notation was used in the consultancy practice CACI. Many of the consultants at CACI (including Richard Barker) subsequently moved to Oracle UK, where they developed the early versions of Oracle's CASE tools, introducing the notation to a wider audience.

ER diagramming tools

There are many ER diagramming tools. A freeware ER tool that can generate database and application layer code (webservices) is the RISE Editor. SQL Power Architect while proprietary also has a free community edition.

Proprietary ER diagramming tools

Proprietary ER diagramming tools that can interpret and generate ER models and SQL and do database analysis are:

LucidChart will generate an ERD from different schema types, but you can't generate SQL from an ERD.

Free/open-source tools

Free software ER diagramming tools that can interpret and generate ER models and SQL and do database analysis are:

Free software diagram tools that just draw the shapes without having any knowledge of what they mean, nor do they generate SQL, include:

Entity–relationships and semantic modeling

Semantic model

A semantic model is a model of concepts, it is sometimes called a "platform independent model". It is an intensional model. At the latest since Carnap, it is well known that:[13]

"...the full meaning of a concept is constituted by two aspects, its intension and its extension. The first part comprises the embedding of a concept in the world of concepts as a whole, i.e. the totality of all relations to other concepts. The second part establishes the referential meaning of the concept, i.e. its counterpart in the real or in a possible world".

Extension model

An extensional model is one that maps to the elements of a particular methodology or technology, and is thus a "platform specific model". The UML specification explicitly states that associations in class models are extensional and this is in fact self-evident by considering the extensive array of additional "adornments" provided by the specification over and above those provided by any of the prior candidate "semantic modelling languages"."UML as a Data Modeling Notation, Part 2"

Entity–relationship origins

Peter Chen, the father of ER modeling said in his seminal paper:

"The entity-relationship model adopts the more natural view that the real world consists of entities and relationships. It incorporates some of the important semantic information about the real world." [1]

In his original 1976 article Chen explicitly contrasts entity–relationship diagrams with record modelling techniques:

"The data structure diagram is a representation of the organisation of records and is not an exact representation of entities and relationships."

Several other authors also support Chen's program:[14] [15] [16] [17] [18]

Philosophical alignment

Chen is in accord with philosophic and theoretical traditions from the time of the Ancient Greek philosophers: Socrates, Plato and Aristotle (428 BC) through to modern epistemology, semiotics and logic of Peirce, Frege and Russell.

Plato himself associates knowledge with the apprehension of unchanging Forms (The forms, according to Socrates, are roughly speaking archetypes or abstract representations of the many types of things, and properties) and their relationships to one another.

Limitations

See also

References

  1. 1 2 Chen, Peter (March 1976). "The Entity-Relationship Model - Toward a Unified View of Data". ACM Transactions on Database Systems 1 (1): 9–36. doi:10.1145/320434.320440.
  2. A.P.G. Brown, "Modelling a Real-World System and Designing a Schema to Represent It", in Douque and Nijssen (eds.), Data Base Description, North-Holland, 1975, ISBN 0-7204-2833-5.
  3. “Designing a Logical Database: Supertypes and Subtypes”
  4. Beynon-Davies, Paul (2004). Database Systems. Basingstoke, UK: Palgrave: Houndmills. ISBN 1403916012.
  5. Thomas Basboell: Motion and society. On meaningfulness of concepts
  6. Hubert Tardieu, Arnold Rochfeld and René Colletti La methode MERISE: Principes et outils (Paperback - 1983)
  7. Elmasri, Ramez, B. Shamkant, Navathe, Fundamentals of Database Systems, third ed., Addison-Wesley, Menlo Park, CA, USA, 2000.
  8. ER 2004 : 23rd International Conference on Conceptual Modeling, Shanghai, China, November 8-12, 2004
  9. A Formal Treatment of UML Class Diagrams as an Efficient Method for Configuration Management 2007
  10. James Dullea, Il-Yeol Song, Ioanna Lamprou - An analysis of structural validity in entity-relationship modeling 2002
  11. Hartmann, Sven. "Reasoning about participation constraints and Chen's constraints". Proceedings of the 14th Australasian database conference-Volume 17. Australian Computer Society, Inc., 2003.
  12. ER2SQL website
  13. http://wenku.baidu.com/view/8048e7bb1a37f111f1855b22.html
  14. Kent in "Data and Reality" :
    "One thing we ought to have clear in our minds at the outset of a modelling endeavour is whether we are intent on describing a portion of "reality" (some human enterprise) or a data processing activity."
  15. Abrial in "Data Semantics" : "... the so called "logical" definition and manipulation of data are still influenced (sometimes unconsciously) by the "physical" storage and retrieval mechanisms currently available on computer systems."
  16. Stamper: "They pretend to describe entity types, but the vocabulary is from data processing: fields, data items, values. Naming rules don't reflect the conventions we use for naming people and things; they reflect instead techniques for locating records in files."
  17. In Jackson's words: "The developer begins by creating a model of the reality with which the system is concerned, the reality that furnishes its [the system's] subject matter ..."
  18. Elmasri, Navathe: "The ER model concepts are designed to be closer to the user’s perception of data and are not meant to describe the way in which data will be stored in the computer."
  19. P. Chen. Suggested research directions for a new frontier: Active conceptual modeling. ER 2006, volume 4215 of Lecture Notes in Computer Science, pages 1–4. Springer Berlin / Heidelberg, 2006.
  20. M. L. Brodie and J. T. Liu. The power and limits of relational technology in the age of information ecosystems. On The Move Federated Conferences, 2010.
  21. A. Badia and D. Lemire. A call to arms: revisiting database design. SIGMOD Record 40, 3 (November 2011), 61-69.
  22. Gregersen, Heidi; Jensen, Christian S. (1999). "Temporal Entity-Relationship models—a survey". IEEE Transactions on Knowledge and Data Engineering 11 (3): 464–497. CiteSeerX: 10.1.1.1.2497.
  23. RICCARDO TORLONE (2003). "Conceptual Multidimensional Models". In Maurizio Rafanelli. Multidimensional Databases: Problems and Solutions (PDF). Idea Group Inc (IGI). ISBN 978-1-59140-053-0.

Further reading

External links

Wikimedia Commons has media related to Entity-relationship models.
This article is issued from Wikipedia - version of the Wednesday, May 04, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.