Simplified Application Development.
No new code is needed for reading or writing the application file.
One has merely to link against the SQLite library, or include the
single "sqlite3.c" source file with the rest of the
application C code, and SQLite will take care of all of the application
file I/O. This can reduce application code size by many thousands of
lines, with corresponding saving in development and maintenance costs.
SQLite is one of the
most used software libraries in the world.
There are literally tens of billions of SQLite database files in use
daily, on smartphones and gadgets and in desktop applications.
SQLite is carefully tested and proven reliable. It is not
a component that needs much tuning or debugging, allowing developers
to stay focused on application logic.
An SQLite database is contained in a single file, which is easily
copied or moved or attached. The "document" metaphor is preserved.
SQLite does not have any file naming requirements
and so the application can use any custom file suffix that it wants
to help identify the file as "belonging" to the application.
SQLite database files contain a 4-byte Application ID in
their headers that can be set to an application-defined value
and then used to identify the "type" of the document for utility
programs such as file(1), further
enhancing the document metaphor.
High-Level Query Language.
SQLite is a complete relational database engine, which means that the
application can access content using high-level queries. Application
developers need not spend time thinking about "how" to retrieve the
information they need from a document. Developers write SQL that
expresses "what" information they want and let the database engine
to figure out how to best retrieve that content. This helps developers
operate "heads up" and remain focused on solving the user's problem,
and avoid time spent "heads down" fiddling with low-level file
A pile-of-files format can be viewed as a key/value database.
A key/value database is better than no database at all.
But without transactions or indices or a high-level query language or
a proper schema,
it is much harder and more error prone to use a key/value database than
a relational database.
Information held in an SQLite database file is accessible using
commonly available open-source command-line tools - tools that
are installed by default on Mac and Linux systems and that are
freely available as a self-contained EXE file on Windows.
Unlike custom file formats, application-specific programs are
not required to read or write content in an SQLite database.
An SQLite database file is not an opaque blob. It is true
that command-line tools such as text editors or "grep" or "awk" are
not useful on an SQLite database, but the SQL query language is a much
more powerful and convenient way for examining the content, so the
inability to use "grep" and "awk" and the like is not seen as a loss.
An SQLite database is a well-defined and well-documented
file format that is in widespread use by literally millions of applications
and is backwards compatible to its inception in 2004 and which promises
to continue to be compatible in decades to come. The longevity of
SQLite database files is particularly important to bespoke applications,
since it allows the document content to be accessed far in the
future, long after all traces of the original application have been lost.
Data lives longer than code.
SQLite databases are recommended by the US Library of Congress
as a storage format for long-term preservation of digital content.
SQLite database files are portable between 32-bit and 64-bit machines and
between big-endian and little-endian architectures and between any of the
various flavors of Windows and Unix-like operating systems.
The application using an SQLite application file format can store
binary numeric data without having to worry about the byte-order of
integers or floating point numbers.
Text content can be read or written as UTF-8, UTF-16LE, or UTF-16BE and
SQLite will automatically perform any necessary translations on-the-fly.
Writes to an SQLite database are atomic.
They either happen completely
or not at all, even during system crashes or power failures. So
there is no danger of corrupting a document just because the power happened
to go out at the same instant that a change was being written to disk.
SQLite is transactional, meaning that multiple changes can be grouped
together such that either all or none of them occur, and so that the
changes can be rolled back if a problem is found prior to commit.
This allows an application to make a change incrementally, then run
various sanity and consistency checks on the resulting data prior to
committing the changes to disk. The
uses this technique
to verify that no repository history has been lost prior to each change.
Incremental And Continuous Updates.
When writing to an SQLite database file, only those parts of the file that
actually change are written out to disk. This makes the writing happen faster
and saves wear on SSDs. This is an enormous advantage over custom
and wrapped pile-of-files formats, both of which usually require a
rewrite of the entire document in order to change a single byte.
Pure pile-of-files formats can also
do incremental updates to some extent, though the granularity of writes is
usually larger with pile-of-file formats (a single file) than with SQLite
(a single page).
SQLite also supports continuous update.
Instead of collecting changes in memory and then writing
them to disk only on a File/Save action, changes can be written back to
the disk as they occur. This avoids loss of work on a system crash or
power failure. An automated undo/redo stack, managed using triggers,
can be kept in the on-disk database, meaning that undo/redo can occur
across session boundaries.
As an application grows, new features can be added to an
SQLite application file format simply by adding new tables to the schema
or by adding new columns to existing tables. Adding columns or tables
does not change the meaning of prior queries, so with a
modicum of care to ensuring that the meaning of legacy columns and
tables are preserved, backwards compatibility is maintained.
It is possible to extend custom or pile-of-files formats too, of course,
but doing is often much harder. If indices are added, then all application
code that changes the corresponding tables must be located and modified to
keep those indices up-to-date. If columns are added, then all application
code that accesses the corresponding table must be located and modified to
take into account the new columns.
In many cases, an SQLite application file format will be
faster than a pile-of-files format or
a custom format. In addition to being faster for raw read and
writes, SQLite can often dramatically improves start-up times because
instead of having to
read and parse the entire document into memory, the application can
do queries to extract only the information needed for the initial screen.
As the application progresses, it only needs to load as much material as
is needed to draw the next screen, and can discard information from
prior screens that is no longer in use. This helps keep the memory
footprint of the application under control.
A pile-of-files format can be read incrementally just like SQLite.
But many developers are surprised to learn that SQLite can read and
write smaller BLOBs (less than about 100KB in size) from its database
faster than those same blobs can be read or written as separate files
from the filesystem. (See
35% Faster Than The Filesystem and
Internal Versus External BLOBs for further information.)
There is overhead associated with operating a relational
database engine, however one should not assume that direct file I/O
is faster than SQLite database I/O, as often it is not.
In either case, if performance problems do arise in an SQLite application
those problems can often be resolved by adding one or two CREATE INDEX
statements to the schema or perhaps running ANALYZE one time
and without having to touch a single line of
application code. But if a performance problem comes up in a custom or
pile-of-files format, the fix will often require extensive changes
to application code to add and maintain new indices or to extract
information using different algorithms.
Concurrent Use By Multiple Processes.
SQLite automatically coordinates concurrent access to the same
document from multiple threads and/or processes. Two or more
applications can connect and read from the same document at the
same time. Writes are serialized, but as writes normally only
take milliseconds, applications simply take turns writing.
SQLite automatically ensures that the low-level format of the
document is uncorrupted. Accomplishing the same with a custom
or pile-of-files format, in contrast, requires extensive support
in the application. And the application logic needed to support
concurrency is a notorious bug-magnet.
Multiple Programming Languages.
Though SQLite is itself written in ANSI-C, interfaces exist for
just about every other programming language you can think of:
C++, C#, Objective-C, Java, Tcl, Perl, Python, Ruby, Erlang,
language they are most comfortable with and which best matches
the needs of the project.
An SQLite application file format is a great
choice in cases where there is a collection or "federation" of
separate programs, often written in different languages and by
different development teams.
This comes up commonly in research or laboratory
environments where one team is responsible for data acquisition
and other teams are responsible for various stages of analysis.
Each team can use whatever hardware, operating system,
programming language and development methodology that they are
most comfortable with, and as long as all programs use an SQLite
database with a common schema, they can all interoperate.
If the application file format is an SQLite database, the complete
documentation for that file format consists of the database schema,
with perhaps a few extra words about what each table and column
represents. The description of a custom file format,
on the other hand, typically runs on for hundreds of
pages. A pile-of-files format, while much simpler and easier to
describe than a fully custom format, still tends to be much larger
and more complex than an SQL schema dump, since the names and format
for the individual files must still be described.
This is not a trivial point. A clear, concise, and easy to understand
file format is a crucial part of any application design.
Fred Brooks, in his all-time best-selling computer science text,
The Mythical Man-Month says:
Representation is the
essence of computer programming.
Show me your flowcharts and conceal your tables, and I shall
continue to be mystified. Show me your tables, and I won't usually
need your flowcharts; they'll be obvious.
Rob Pike, in his
Rules of Programming expresses the same idea this way:
Data dominates. If you've chosen the right data structures
and organized things well, the algorithms will almost always
be self-evident. Data structures, not algorithms, are central
Linus Torvalds used different words to say
much the same thing on the Git mailing list on 2006-06-27:
Bad programmers worry about the code. Good programmers worry
about data structures and their relationships.
The point is this: an SQL database schema almost always does
a far better job of defining and organizing the tables and
data structures and their relationships.
And having clear, concise, and well-defined representation
almost always results in an application that performs better,
has fewer problems, and is easier to develop and maintain.
Let's block ads! (Why?)