04. Offbeat Databases - PART 2

As we continue this topic, I would like to thank all the websites that helped me to gather this data, the reference list is given in the appendix section.

1. Hierarchical Database System (HDBS):

HDBS system considered:

IBM Information Management System

Factors that drove this innovation:

A huge set of hierarchical data needed to be retrieved, stored, and organized quickly. For example, ICS/DL/I (Information Control System/Data Language/Interface), the old name of IMS, in 1969, kept track of the huge bill of materials for the Saturn V, which was composed of hundreds of thousands of parts.

Some Properties/information of an HDBS:

1. Advantages: Fast and easy data retrieval, easy to add/delete information, predictable data structure, efficient storage of data and good performance.

2. Disadvantages: Structural rigidity, absence of many-to-many relation storage ability, re-organizing is difficult, limited scope due to limited use cases involving hierarchical data base as compared to an RDBMS.

3. The schema for hierarchical databases is defined by its tree-like organization, in which there is typically a root “parent” directory of data stored as records that links to various other subdirectory branches, and each subdirectory branch, or child record, may link to various other subdirectory branches. This type of database is commonly used to represent data relationships, such as those found in organizational charts or family trees.

4. The hierarchical database structure dictates that, while a parent record can have several child records, each child record can only have one parent record. Data within records is stored in the form of fields, and each field can only contain one value. Retrieving hierarchical data from a hierarchical database architecture requires traversing the entire tree, starting at the root node‍. BUT Graph databases can contain complex, many-to-many and hierarchical information.

5. Characteristics of hierarchical database models include their simplicity, but also their lack of flexibility. Hierarchical structures, unlike relational databases, do not describe many-to-one relationships or many-to-many relationships due to the fact that child records can only have a single parent.

6. A fundamental difference between segments in a hierarchical database and tables in a relational database is that, in a hierarchical database, segments are implicitly joined with each other. In a relational database, you must explicitly join two tables.

7. The key advantage of a hierarchical database is its ease of use. The one-to-many organization of data makes traversing the database simple and fast, which is ideal for use cases such as website drop-down menus or computer folders in systems like Microsoft Windows OS.

8. The tree-like organization of data requires top-to-bottom sequential searching, which is time consuming, and requires repetitive storage of data in multiple different entities, which can be redundant.

9. The retail industry’s most widely used programming system for managing complex databases and terminal networks is IBM’s Information Management System/Virtual Storage (IMS). It has the industry’s highest proven availability and data integrity for applications with very large transaction volumes.

10. IMS consists of three components: • The Database Manager (IMS DB) • The Transaction Manager (IMS TM) • The Common Services to IMS DB/DC

11. At the start of 1987, IMS had more than 6000 installations worldwide, representing banking, manufacturing, transportation, insurance, finance, and other businesses.

12. Most clients have several thousand workstations and terminals, ranging up to 50,000.

13. Users have written more than 2 billion lines of application software on top of the IMS system.

14. IMS has maintained a rate of more than 1000 database transactions per second on a large IBM computer accessed by 15,000 active terminals.

15. Indeed, more than 95 percent of the top Fortune 1000 companies use IMS to process more than 50 billion transactions a day and manage 15 million gigabytes of critical business data.

16. The commercial product had two main parts: the database system supporting a hierarchical, tree-like structure data model; and transaction-processing software for handling complex, high-volume transactions, such as order entry, inventory management, payroll and claims processing, airline or hotel reservations, financial applications, and other transaction-oriented applications.

17. A unique feature of IMS was its “queued system”—meaning a system that essentially receives all transactions as they arrive and holds them until they can be processed. This way when, for example, an airline agent enters a transaction into his or her computer, the automated transaction manager takes care of updating the IMS, so another ticket agent doesn’t sell the same seat.

18. And in its first two decades, the customer investment in IMS applications grew to approximately 10 to 12 billion lines of code.

19. Some interesting quotes about IBM IMS:

“… almost ninety five percent of Fortune 1000 companies use IMS for their most critical IBM System z data management needs with more than 50 billion transactions running through IMS databases on a daily basis.”

“IBM Introduces New Version of IMS Software,” IBM press release on October 10, 2007

“Most of this country’s [USA’s] large corporations now use IMS for their biggest applications. … It’s especially suited to applications requiring continuous availability and very high transaction rates, such as automated teller machines. Other IMS applications include manufacturing inventory control and payroll processing.”

“IMS – yesterday, today, and maybe forever,” Vision San Jose 1990

“At this point, the world probably can’t do without IMS. Many of the largest corporations in the world depend on it to run their everyday business. The cost to the world and its companies if IMS were abandoned would be unbelievable. “Don Lundberg, SYSTEMS ENGINEER, “Q&A: An interview with IMS’ Don Lundberg,” Vision San Jose, 1990

20. Who uses IMS The top worldwide companies in many industries use IMS to run their daily operations: • Aerospace • Banking • Communications • Finance • Government • Health Care • Insurance • Manufacturing • Retailing • Technology

21. IMS offers a wide preference implementing new application design: • Integration • Open Standards • Reliability • Scalability • Self-management • Web serving • XML processing • Workload readiness for SOA • Ownership value

Appendix:

Reference Lists:

https://en.wikipedia.org/wiki/Hierarchical_database_model#:~:text=The%20hierarchical%20structure%20is%20used,%2C%20health%20care%2C%20and%20telecommunications.

https://learn.microsoft.com/en-us/windows/win32/sysinfo/structure-of-the-registry?redirectedfrom=MSDN

https://www.ibm.com/ibm/history/ibm100/us/en/icons/ibmims/

https://www.heavy.ai/technical-glossary/hierarchical-database#:~:text=A%20hierarchical%20database%20is%20a,child%20nodes%20connected%20through%20links.

https://databasetown.com/hierarchical-database/

https://www.red-gate.com/simple-talk/databases/sql-server/t-sql-programming-sql-server/sql-server-graph-databases-part-4-working-hierarchical-data-graph-database/

https://www.ibm.com/docs/en/ims/14.1.0?topic=ims-comparison-hierarchical-relational-databases

http://zseries.marist.edu/pdfs/ztidbitz/22%20zTidBits%20%28IMS_Then&ToDay%29.pdf

https://analytics4all.org/2023/04/03/hierarchical-databases/

2. Time Series Database System (TSDBS):

TSDBS system considered:

Influx DB

Factors that drove this innovation:

A time-series database lets you store large volumes of timestamped data in a format that allows fast insertion and fast retrieval to support complex analysis on that data. A Time Series Database is a database that contains data for each point in time.

Some Properties/information of an TSDBS:

1. A Time Series DBMS is a database management system that is optimized for handling time series data: each entry is associated with a timestamp. For example, time series data may be produced by sensors, smart meters or RFIDs in the so-called Internet of Things or may depict the stock tickers of a high frequency stock trading system.

2. Time Series DBMS are designed to efficiently collect, store and query various time series with high transaction volumes. Although time series data can be managed with other categories of DBMS (from key-value stores to relational systems), the specific challenges often require specialized systems.E.g. a query like 'SELECT SENSOR1_CPU_FREQUENCY / SENSOR2_HEAT' joins two time series based on the overlapping areas of time for each and outputs a single composite time series.

3. Popular use cases:

a. Monitoring IoT devices

b. Distributed observability and tracing

c. Logistics management

d. Geo spatial applications (telemetry data)

4. Modern TSDB needs to be able to handle both regular (measurements that are taken at fixed intervals of time, like every 10 seconds- devops data, sensor data etc.,) and irregular (corresponds to discrete events such as requests to an API, trades in a stock market) events and metrics.

5. Time series data needs to focus on fast ingestion.

6. High-precision data is kept for some short period of time with longer retention periods for summary data at medium or lower precision.

7. An agent or the database itself must continuously compute summaries from the high-precision data for longer term storage.

8. The query pattern of time series can be quite different from other database workloads.

9. On disk, the data is organized in a columnar style format where contiguous blocks of time are set for the measurement, tagset, fieldset. So, each field is organized sequentially on disk for blocks of time, which makes calculating aggregates on a single field a very fast operation.

10. Influx DB stack:

● Telegraf is built for data collection (with 200+ plugins that integrate with other products). ● InfluxDB is the database and storage tier. ● Chronograf is for visualization of time series data. ● Kapacitor is the rules engine for processing, monitoring, and alerting.

11. Sample Data Model in Influx DB:

CPU, host=server A, region=uswest idle=23, user=42, system=12 1549063516

Appendix:

Reference Lists:

https://db-engines.com/en/article/Time+Series+DBMS

https://db-engines.com/en/blog_post/71

https://www.influxdata.com/modern-time-series-platform/

https://github.com/chengshiwen/influxdb-cluster/wiki/Home-Eng

https://www.influxdata.com/blog/influxdb-and-kafka-how-influxdata-uses-kafka-in-production/

https://bitworks.software/2019-03-21-improving-influxdb-with-apache-kafka.html

Influx DB architecture:

3. Event Stream Database System (ESDBS):

ESDBS system considered:

A Kafka driven streaming platform.

Factors that drove this innovation:

To build a data service at the heart of the company, where applications could find the important datasets that made the company work (reference data, master data, golden source of data etc.,)

Some Properties/information of an ESDBS:

1. Need newly created/modified master data in the stream for all the data marts plugged to the event stream

2. Kafka has an asynchronous publish-subscribe architecture that enables trillions of messages a day to be transported around the organization.

3. Kafka’s Replayable logs (as event stores) decouple services from one another, much like a messaging system does, but they also provide a central point of storage that is fault-tolerant and scalable—a shared source of truth that any application can fall back to.

4. In this architecture, data goes to the applications and not the other way.

5. In companies, the business facts that SOA/micro services do choose to share are the most important facts of all. They are the truth that the rest of the business is built on. It is this data that passes through the Kafka messaging system in the new architecture proposed. Thus, data movement gives applications a level of operability and control that is unachievable with a direct, runtime dependency.

6. Kafka provides an asynchronous protocol for connecting programs together. The difference between HTTP/RPC protocol and Kafka is the presence of a broker. A broker is a separate piece of infrastructure that broadcasts messages to any programs that are interested in them, as well as storing them for as long as is needed. So, it is perfect for streaming or fire-and-forget messaging.

7. Kafka, unlike ESB platforms, is a streaming platform, and as such adds emphasis on high throughput events and stream processing.

8. Kafka provides storage; production topics with hundreds of terabytes are not uncommon. It also has KSQL and KStreams as query tools.

9. It is not uncommon to see Kafka clusters with more than 100 nodes. (Netflix uses 200+ nodes)

10. Topics are ties to partitions and a max of 4000 partitions per broker may be permitted

11. Sometimes 200,000 partitions and 50+ brokers can exist per Kafka cluster.

12. In the new Relic company, a single cluster of around 100 nodes, spanning 3 data centers was deployed to process 30 GB/s.

13. Less intense deployments can have anywhere from 5-10 node clusters.

14. Each partition is consumed by a single process.

15. There can be up to 10 partitions for a single topic in a nominal deployment.

16. Max message size can be set to 1MB.

17. Kafka as a storage layer can hold even 100TB of data.

Appendix:

Reference Lists:

Content references are from the Confluent site. (Author Ben Stopford)

Core components of a streaming platform:

A sample of how different service apps use Kafka for data:

Hari Om!

S-T-F

04. Offbeat Databases - PART 2