A quick primer on graph databases

Graph databases have moved from an academic topic to the mainstream of information technology over the last few years. Now, IT professionals are confronted with the need to better understand:

What business problems do graph databases address well?
What advantages do graph databases offer over widely-implemented relational databases?
What issues emerge as graph databases are introduced into an existing application portfolio?

A graph database (GDB) uses graph structures to represent and store data. Graph databases emphasize relationships among data entities.

DBMS developments

First, a bit of history: To improve data management and data processing as data volumes grew, database management systems (DBMS) emerged as a separate software layer between the operating system and the application program in the 1960s.

By the 1980s, the relational DBMSs had become and have remained the principal DBMS. The 2000s saw the emergence of graph databases, XML databases, NoSQL databases and the idea that databases didn’t need to be tightly structured in a purely tabular form.

During the 2010s, databases supporting the JSON open standard file format gained traction. We also saw the rise and eventual fall of Hadoop, a software framework for processing big data using a highly distributed storage system.

A note of caution: Graph databases are not a replacement for relational databases. The two types of databases fulfill different data processing and application objectives. This article focuses on describing the data and applications where graph databases can be a superior solution.

Data volume explosion

Fast forward to today: Data volumes continue to explode exponentially. Many sources generate vast data volumes, including:

Voracious data demand for AI applications and data analytics: The demands of AI and data analytics, driven by the move toward more data-driven organizations, have led to significant increases in data volume.
Internet of Things (IoT): The explosion in the number of industrial and consumer devices or things that monitor the performance of almost everything. IoT devices generate huge data volumes.
Digitalization of society: The most obvious examples are the vast volume of digital data available on the web and its consumption by billions of people.
Graphic and video data types: Originally, data meant letters and numbers only. Introducing graphics and videos has added many orders of magnitude more data.
Digital transformation of businesses and government: Most organizations are actively working to enhance application functionality and eliminate the remaining bits of paper and Excel workbooks that exist between their systems.

Graph database opportunities

Today’s problem: The many advances in DBMSs and the huge improvements in computing infrastructure performance introduced over many decades are nonetheless straining or failing to handle these vast data volumes.

Today’s solution: Applications that access graph databases can solve various problems that cause frustration at the enterprise level. Examples of these applications, for which IT professionals need to consider graph databases seriously, include:

Artificial intelligence
Computing infrastructure monitoring
Customer 360 interaction analysis
Fraud detection
Knowledgebase
Metadata management
Master data management
Natural language processing
Recommendation engine
Social media influencer analysis.

These applications benefit from using graph databases because they:

Deliver excellent performance for complex data analytics
Simplify data ingestion and integration from diverse data sources
Manage vast data volumes reliably.

Advantages of graph databases

Graph databases are well-suited to managing highly interconnected data and to quickly producing concise results for complex queries.

For each advantage listed below, graph databases are compared with relational databases.

Query speed

All end-users are impatient and expect quick responses. Nobody cares about the impact of query complexity or the vastness of data volumes that a query must traverse to produce a result. DBMSs work hard to respond to this performance expectation.

In graph databases, query speed depends only on the number of concrete relationships, not on the total data volume. This focus on reading only the data directly or closely related to the relationships being queried produces super-fast results.

In relational databases, query speed depends on the number of tables to be joined and the amount of data in each table. This focus on tables and data volume means queries slow materially as the number of tables and the data volume involved increase. While good index design and effective query optimization can reduce response times, they are often not enough on their own.

Representation of relationships

Whenever a DBMS accurately represents real-world relationships and avoids kluges or workarounds such as cross-reference tables or composite keys, it’s easier for software developers to understand the database’s data organization. That ease-of-understanding leads to:

More accurate, reliable solutions with less development effort
Reduced effort and elapsed time to implement future enhancements.

In graph databases, relationships are stored alongside attribute data. This relationship storage enables high-performance queries, even for complex queries or large data volumes.

In relational databases, relationships are defined by foreign key values or by software logic. Foreign keys are incredibly useful up to the point where they trigger too many joins or even force a self-join. At that point, foreign keys significantly degrade query performance. Defining relationships through software logic makes it difficult to understand them solely from the database schema and requires significant software maintenance effort.

Graph databases can easily represent and query hierarchies of data. Hierarchies are more difficult for relational databases to represent, leading to multiple tables that degrade query performance.

Representation of data structures

Whenever a DBMS accurately represents real-world data structures, the organization can realize more of the benefits listed above under Representation of relationships.

In graph databases, data structures are more flexible. While data is still stored in tables, the table definitions and their relationships can be altered dynamically.

These characteristics are particularly important when the data doesn’t have a specific format. A good example is Facebook comments or posts, which can include any combination of text, images, videos, links and geographic coordinates.

In relational databases, data structures are more rigid. They:

Always consist of related tables that together define and contain the data available for entities
Rely exclusively on values of foreign keys to represent the relationships between entities
Must be defined in advance.

Making changes to relational database data structures always requires careful impact analysis and planning. Often, an application outage is required to introduce the change.

Disadvantages of graph databases

For each disadvantage listed below, graph databases are compared with relational databases.

Rapidly evolving technology

Every graph database vendor regularly introduces major enhancements. This high level of product development creates:

Difficulty comparing products because the landscape is changing so quickly
Product stability issues because it’s difficult to test all this new software thoroughly.

Example graph database enhancements include support for:

Graph Retrieval-Augmented Generation (GraphRAG)
Graphics processing units (GPU)
Both property graphs and semantic graphs
High-speed data ingestion
Integrated data visualization
Integrated machine-learning algorithms and tools
JSON – the open standard file format for data storage
XML format data storage
NoSQL data structures
Document entity enrichment – parsing unstructured data for entity values to store as structured data.

Relational database vendors are also introducing many of these enhancements in response to competitive pressure and customer requests.

Difficult to scale

Most graph database designs initially used only a one-tier architecture. Some vendors have begun to offer sharding, which distributes a database across multiple servers to handle larger-scale databases.

Most relational databases have supported sharding for many years.

No standard language

Every graph database vendor has its own syntax or language for updates and queries. Every vendor claims their language is superior. Most vendors support some version of Gremlin, SPARQL or Cypher. This lack of standardization makes it difficult to migrate from one database product to another and adds to the cost of training staff in a particular language.

All relational databases support the standard SQL language for updates and queries. Although many vendors have extended SQL, all vendors support the core SQL language. This standardization makes it easy to find and onboard experienced staff.

Lack of parallelism

Some graph databases offer parallelism, and others don’t. Parallelism is the ability of the database engine to concurrently process both queries and updates submitted by multiple active tasks.

End-users of relational databases take parallelism for granted.

Missing operational features

End-users of relational databases take operational features such as the following for granted:

Transactions and the associated rollback mechanism
Various data recovery options
Durability – guarantees that committed transactions survive permanently

Graph databases either don’t offer these operational features or are working on them.

Graph database vendors

For more information about graph databases and vendor-specific assessments, please consult the Gartner Magic Quadrant for Cloud Database Management Systems.

A quick primer on graph databases

DBMS developments

Data volume explosion

Graph database opportunities

Advantages of graph databases

Query speed

Representation of relationships

Representation of data structures

Disadvantages of graph databases

Rapidly evolving technology

Difficult to scale

No standard language

Lack of parallelism

Missing operational features

Graph database vendors

Top Stories

Researcher Says “APT” Label No Longer Reflects the Threat Landscape

How do you select a graph database? – Part 1

OpenAI plans major hiring push as competition intensifies

Intel to raise CPU prices by 10% as AI demand strains supply

Microsoft scales back Copilot integrations across Windows apps

OpenAI flags reliance on Microsoft and compute as key risks ahead of potential IPO

Related Articles

Researcher Says “APT” Label No Longer Reflects the Threat Landscape

How do you select a graph database? – Part 1

Cursor’s Composer 2 launch sparks scrutiny over undisclosed model foundation

OpenAI plans major hiring push as competition intensifies

Yogi Schulz

Yogi Schulz

Jim Love

Follow Us

Popular categories

Tech News Delivered