InfinityDB
InfinityDB is an all-Java embedded database engine that is deployed in handheld devices, on servers, on workstations, and in distributed settings. The design is based on a proprietary lockless, concurrent, B-Tree architecture that enables client programmers to reach high levels of performance without risk of failures. [1] Data is stored to and retrieved from a single embedded database file using the InfnityDB API that allows direct access to the variable length item spaces. Database client programmers can construct traditional relations as well as specialized models that directly satisfy the needs of the dependent application. There's no limit to the number of items, database size, or JVM size, so InfinityDB can function in both the smallest environment that provides random access storage and can be scaled to large settings. Traditional relations and specialized models can be directed to the same database file. InfinityDB can be optimized for standard relations as well as all other types of data, allowing client applications to perform at a minimum of one million operations per second on a virtual, 8-core system.
AirConcurrentMap, is an in-memory map that implements the Java ConcurrentMap interface,[2] but internally it uses a multi-core design so that its performance and memory make it the fastest Java Map when ordering is performed and it holds medium to large numbers of entries.[3] AirConcurrentMap iteration is faster than any Java Map iterators, regardless of the specific map type.
Relational and Specialized or Custom Data Model
Data elements include all Java primitive data types, dates, strings, small char or byte arrays, or byte strings. These are called 'components' and they are atomic. Components can be concatenated into short composites called 'Items' which are the unit of storage and retrieval. Higher-level structures that combine these Items include unlimited size records of an unlimited number of columns or attributes, with complex attribute values of unlimited size. Keys may be a composition of components. Attribute values can be ordered sets of composite components, character large objects (CLOB's), binary large objects (BLOB's), or unlimited sparse arrays. Other higher-level structures built of multiple Items include key/value associations like ordered maps, ordered sets, Entity-Attribute-Value nets of quadruples, trees, DAG's, taxonomies, or full-text indexes. Mixtures of these can occur along with custom client-defined structures.
Data Encoding
An 'ItemSpace' represents the entire database, and it is a simple ordered set of Items, with no other state. An Item is actually stored with each component encoded in variable-length binary form in a char array, with components being self-describing in a standard format which sorts correctly. Programmers deal with the components only as primitives, and the stored data is strongly typed. Data is not stored as text to be parsed with weak typing as in JSON or XML, nor is it parsed out of programmer-defined binary stream representations. There are no custom client-devised binary formats that can grow brittle, and which can have security, documentation, upgrade, testing, versioning, scaling, and debugging problems, such as is the case with Java Object serialization.
Scaling
All access to the system is via a few basic methods that can store and retrieve in order one variable-length Item from 0 to up to about 1KB at a time at approximately 1M operations/second when in memory. Typical Items are about 30 bytes. Because each low-level operation is limited to one Item, operations on short pieces of data are fast. Such small operations do not require for example formatting or reading and parsing entire JSON or XML texts or entire Java Object serialization graphs. Space and performance scaling of an ItemSpace is smooth as any size of low-level or high-level structure is created, grows, shrinks, or disappears. The 'chunked' forms have practical minimum and maximum sizes per chunk. For performance and efficiency, the Items are stored inside a single B-Tree prefix-compressed and variable length as an uninterpreted sequence of bytes for further compression. The The B-Tree may typically grow to the 100GB's range but has no limits. There is only one file, so there is no log or other files to write to and flush. Recovery after abrupt termination is immediate and requires no slow log replay. Large data loading can be globally transactional with unlimited data size limited primarily by disk I/O. Smaller ACID transactions employ optimistic locking and are limited by either in-memory operation speed or disk I/O. Non-transactional single-Item operations are fastest. Each thread uses a different core with little interference and no thread limitation.
Schemaless Upgrade and Downgrade
schema upgrade when structures are extended or modified is done by adding or removing Items in new ways at runtime, and there are no upgrade scripts - hence the data model is No-SQL and schemaless. Because an empty or non-existent piece of data requires no space by virtue of having no Items associated with it, it is possible to extend the structure simply by adding Items of new kinds. If Items are removed, all logical and physical space is immediately reclaimed.
For example, backwards compatibility with older databases in a relational model while making a single-valued attribute into a multi-value attribute can be achieved simply by storing more values (one per Item) alongside the existing one without deleting the existing one, since every attribute is already logically multi-value and is logically composite. There is little code change and no internal or external database structure change. If a value is deleted and it is the last value of a multi-value attribute, all of its space is recovered immediately. In other words, every attribute can logically store an ordered set of optionally composite values, but there is no cost for the generality in common cases, and no cost at all for null values.
If a new table is to be added, it is created at the moment its first row is inserted. The first row appears only when the first attribute value is added to it. Forwards compatibility with future databases is possible because future databases may have new tables or any other structure that can be discovered and used, can be simply ignored or can be handled opaquely by older code.
Designed for High Performance and Minimal Administrative Needs
Each InfinityDB instance stores data into a single database file and does not require additional log or rollback files of any type. Database consistency is guaranteed with the Commit function that can be called as often as the dependent application requires. In the event of any catastrophe except power failure or other hardware malfunction, the database is guaranteed to be consistent with the status as of completion of the last Commit. InfinityDB minimizes the size of its database file through four types of compression (prefix, suffix, zlib, and UTF-8).
Immediate Garbage Collection
As data structures grow and shrink, freed space is reclaimed immediately and made available for other structures. Systems can run indefinitely without gradual space leaks or long interruptions during garbage reclamation phases. When a data structure becomes empty, all of its space is recycled, rather than leaving behind 'tombstones' or other place holders. For example, a possibly very large multi-value attribute may shrink to one value, becoming as efficient as any single-valued attribute, and if that last value is deleted, all space for it is reclaimed, including the space for the attribute it was attached to, and if a row has only attributes with no values, the row is reclaimed completely as well. If a table loses all of its rows, the space for the table is reclaimed. Any size or type of data structure has this property. There are no reference counters, hence any type of graph is incrementally collected automatically.
Products
InfinityDB Version 3.0 features:
- Concurrent, multi-threaded processing on multiple cores without locks.
- This patent-pending multi-core concurrency increases performance on multi-core platforms, such as the Intel i7 by approximately seven times.
AirConcurrentMap is the fastest Java Map when concurrency and ordering is done on maps that are medium to large in size.
- The same calls can be made to AirConcurrentMap as those that are used with the Java ordered concurrent Maps like ConcurrentSkipListMap; this means that it is immediately possible to compare the performance.
- Multi-core concurrency is key to the design of this product. .
History
Roger L. Deran designed and developed the Infinity Database Engine over 20 years ago and holds US Patent 5283894.[4] The Infinity Database Engine was first deployed in Intel 8088 assembly language in the ROSCOR sports video editor (RSVE) that was licensed to NFL teams in the 1980s. Lexicon purchased the RSVE in 1989, and greatly expanded its deployment to all types of professional and college sports.[5] Java version 2.0 added transactionality, and version 3.0 added concurrency features which are patent pending, and apply to InfinityDB as well as AirConcurrentMap. Infinity DB remains in active use in thousands of sites of various kinds, while AirConcurrentMap is new.
Uses of the all-JAVA InfinityDB, marketed by Boiler Bay Inc. since 2002, include:
- consolidation of pharmaceutical and medical data
- collection, description, consolidation, and sharing of ornithological data
- representation of taxonomies of various types
- programming environment tools such as source code repository navigation
- text indexers
- email consolidation systems
- distributed industrial data collection systems.
References
- ↑ Peters, L & Lavers, T (2008). Swing Extreme Testing: The Extreme Approach to Complete Java Application Testing. Packt Publishing. p. 224.
- ↑ https://docs.oracle.com/javase/tutorial/collections/interfaces/map.html
- ↑ http://boilerbay.com/docs/AirConcurrentMap_Performance_Testing.pdf
- ↑ US 5283894 "Lockless concurrent B-tree index meta access method for cached nodes"
- ↑ New York Times - Sports World Specials: Video Technology; Custom Replays