177 stories
·
1 follower

11 Foot 8 – The Can-opener Bridge

1 Comment

On Dec. 14, the driver of yet another tall boxtruck ignored the red traffic light at Gregson and Peabody, ignored the overheight warning signage and slammed into the 11foot8 bridge. I did not see that crash, but I noticed some new debris a few days later. It took me while to find the footage, but here it is. Sorry for the poor video quality at the end, but something must have interfered with the the WiFi signal from the camera […]

Read More »

Let's block ads! (Why?)

Read the whole story
GaryBIshop
16 days ago
reply
What a great site
Share this story
Delete

SQLite as an Application File Format

1 Comment

Simplified Application Development. No new code is needed for reading or writing the application file. One has merely to link against the SQLite library, or include the single "sqlite3.c" source file with the rest of the application C code, and SQLite will take care of all of the application file I/O. This can reduce application code size by many thousands of lines, with corresponding saving in development and maintenance costs.

SQLite is one of the most used software libraries in the world. There are literally tens of billions of SQLite database files in use daily, on smartphones and gadgets and in desktop applications. SQLite is carefully tested and proven reliable. It is not a component that needs much tuning or debugging, allowing developers to stay focused on application logic.

  • Single-File Documents. An SQLite database is contained in a single file, which is easily copied or moved or attached. The "document" metaphor is preserved.

    SQLite does not have any file naming requirements and so the application can use any custom file suffix that it wants to help identify the file as "belonging" to the application. SQLite database files contain a 4-byte Application ID in their headers that can be set to an application-defined value and then used to identify the "type" of the document for utility programs such as file(1), further enhancing the document metaphor.

  • High-Level Query Language. SQLite is a complete relational database engine, which means that the application can access content using high-level queries. Application developers need not spend time thinking about "how" to retrieve the information they need from a document. Developers write SQL that expresses "what" information they want and let the database engine to figure out how to best retrieve that content. This helps developers operate "heads up" and remain focused on solving the user's problem, and avoid time spent "heads down" fiddling with low-level file formatting details.

    A pile-of-files format can be viewed as a key/value database. A key/value database is better than no database at all. But without transactions or indices or a high-level query language or a proper schema, it is much harder and more error prone to use a key/value database than a relational database.

  • Accessible Content. Information held in an SQLite database file is accessible using commonly available open-source command-line tools - tools that are installed by default on Mac and Linux systems and that are freely available as a self-contained EXE file on Windows. Unlike custom file formats, application-specific programs are not required to read or write content in an SQLite database. An SQLite database file is not an opaque blob. It is true that command-line tools such as text editors or "grep" or "awk" are not useful on an SQLite database, but the SQL query language is a much more powerful and convenient way for examining the content, so the inability to use "grep" and "awk" and the like is not seen as a loss.

    An SQLite database is a well-defined and well-documented file format that is in widespread use by literally millions of applications and is backwards compatible to its inception in 2004 and which promises to continue to be compatible in decades to come. The longevity of SQLite database files is particularly important to bespoke applications, since it allows the document content to be accessed far in the future, long after all traces of the original application have been lost. Data lives longer than code. SQLite databases are recommended by the US Library of Congress as a storage format for long-term preservation of digital content.

  • Cross-Platform. SQLite database files are portable between 32-bit and 64-bit machines and between big-endian and little-endian architectures and between any of the various flavors of Windows and Unix-like operating systems. The application using an SQLite application file format can store binary numeric data without having to worry about the byte-order of integers or floating point numbers. Text content can be read or written as UTF-8, UTF-16LE, or UTF-16BE and SQLite will automatically perform any necessary translations on-the-fly.

  • Atomic Transactions. Writes to an SQLite database are atomic. They either happen completely or not at all, even during system crashes or power failures. So there is no danger of corrupting a document just because the power happened to go out at the same instant that a change was being written to disk.

    SQLite is transactional, meaning that multiple changes can be grouped together such that either all or none of them occur, and so that the changes can be rolled back if a problem is found prior to commit. This allows an application to make a change incrementally, then run various sanity and consistency checks on the resulting data prior to committing the changes to disk. The Fossil DVCS uses this technique to verify that no repository history has been lost prior to each change.

  • Incremental And Continuous Updates. When writing to an SQLite database file, only those parts of the file that actually change are written out to disk. This makes the writing happen faster and saves wear on SSDs. This is an enormous advantage over custom and wrapped pile-of-files formats, both of which usually require a rewrite of the entire document in order to change a single byte. Pure pile-of-files formats can also do incremental updates to some extent, though the granularity of writes is usually larger with pile-of-file formats (a single file) than with SQLite (a single page).

    SQLite also supports continuous update. Instead of collecting changes in memory and then writing them to disk only on a File/Save action, changes can be written back to the disk as they occur. This avoids loss of work on a system crash or power failure. An automated undo/redo stack, managed using triggers, can be kept in the on-disk database, meaning that undo/redo can occur across session boundaries.

  • Easily Extensible. As an application grows, new features can be added to an SQLite application file format simply by adding new tables to the schema or by adding new columns to existing tables. Adding columns or tables does not change the meaning of prior queries, so with a modicum of care to ensuring that the meaning of legacy columns and tables are preserved, backwards compatibility is maintained.

    It is possible to extend custom or pile-of-files formats too, of course, but doing is often much harder. If indices are added, then all application code that changes the corresponding tables must be located and modified to keep those indices up-to-date. If columns are added, then all application code that accesses the corresponding table must be located and modified to take into account the new columns.

  • Performance. In many cases, an SQLite application file format will be faster than a pile-of-files format or a custom format. In addition to being faster for raw read and writes, SQLite can often dramatically improves start-up times because instead of having to read and parse the entire document into memory, the application can do queries to extract only the information needed for the initial screen. As the application progresses, it only needs to load as much material as is needed to draw the next screen, and can discard information from prior screens that is no longer in use. This helps keep the memory footprint of the application under control.

    A pile-of-files format can be read incrementally just like SQLite. But many developers are surprised to learn that SQLite can read and write smaller BLOBs (less than about 100KB in size) from its database faster than those same blobs can be read or written as separate files from the filesystem. (See 35% Faster Than The Filesystem and Internal Versus External BLOBs for further information.) There is overhead associated with operating a relational database engine, however one should not assume that direct file I/O is faster than SQLite database I/O, as often it is not.

    In either case, if performance problems do arise in an SQLite application those problems can often be resolved by adding one or two CREATE INDEX statements to the schema or perhaps running ANALYZE one time and without having to touch a single line of application code. But if a performance problem comes up in a custom or pile-of-files format, the fix will often require extensive changes to application code to add and maintain new indices or to extract information using different algorithms.

  • Concurrent Use By Multiple Processes. SQLite automatically coordinates concurrent access to the same document from multiple threads and/or processes. Two or more applications can connect and read from the same document at the same time. Writes are serialized, but as writes normally only take milliseconds, applications simply take turns writing. SQLite automatically ensures that the low-level format of the document is uncorrupted. Accomplishing the same with a custom or pile-of-files format, in contrast, requires extensive support in the application. And the application logic needed to support concurrency is a notorious bug-magnet.

  • Multiple Programming Languages. Though SQLite is itself written in ANSI-C, interfaces exist for just about every other programming language you can think of: C++, C#, Objective-C, Java, Tcl, Perl, Python, Ruby, Erlang, JavaScript, and so forth. So programmers can develop in whatever language they are most comfortable with and which best matches the needs of the project.

    An SQLite application file format is a great choice in cases where there is a collection or "federation" of separate programs, often written in different languages and by different development teams. This comes up commonly in research or laboratory environments where one team is responsible for data acquisition and other teams are responsible for various stages of analysis. Each team can use whatever hardware, operating system, programming language and development methodology that they are most comfortable with, and as long as all programs use an SQLite database with a common schema, they can all interoperate.

  • Better Applications. If the application file format is an SQLite database, the complete documentation for that file format consists of the database schema, with perhaps a few extra words about what each table and column represents. The description of a custom file format, on the other hand, typically runs on for hundreds of pages. A pile-of-files format, while much simpler and easier to describe than a fully custom format, still tends to be much larger and more complex than an SQL schema dump, since the names and format for the individual files must still be described.

    This is not a trivial point. A clear, concise, and easy to understand file format is a crucial part of any application design. Fred Brooks, in his all-time best-selling computer science text, The Mythical Man-Month says:

    Representation is the essence of computer programming.
    ...
    Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious.

    Rob Pike, in his Rules of Programming expresses the same idea this way:

    Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

    Linus Torvalds used different words to say much the same thing on the Git mailing list on 2006-06-27:

    Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

    The point is this: an SQL database schema almost always does a far better job of defining and organizing the tables and data structures and their relationships. And having clear, concise, and well-defined representation almost always results in an application that performs better, has fewer problems, and is easier to develop and maintain.

    Let's block ads! (Why?)

    Read the whole story
    GaryBIshop
    23 days ago
    reply
    Good arguments.
    Share this story
    Delete

    Dynamic Entropy

    2 Comments and 7 Shares
    Despite years of effort by my physics professors to normalize it, deep down I remain convinced that 'dynamical' is not really a word.
    Read the whole story
    GaryBIshop
    26 days ago
    reply
    Great quotes.
    Share this story
    Delete
    1 public comment
    alt_text_bot
    26 days ago
    reply
    Despite years of effort by my physics professors to normalize it, deep down I remain convinced that 'dynamical' is not really a word.
    martinbaum
    26 days ago
    I managed to get a small number of natural history museum scientists to adopt “scientifical” 20 years ago to describe the window dressing around science. You know, the lab coats, specimen trays, instrumentation of certain types, etc. Sadly, they’ve all retired or moved on.

    'Dord': A Ghost Word

    1 Share

    One of the questions people like to ask lexicographers is this: Can you sneak something into the dictionary? Can you, say, make up a word like strilpy and say it means "embarrassed" and just casually drop it into the dictionary to sit there among the real words like a legit member of the language?

    dictionary-with-pages-flipping

    Spooookyyyyy.

    The answer at Merriam-Webster is emphatically no: not only would you be violating the (imaginary) Universal Lexicographer's Ethical Code, but you'd be caught. A dictionary entry is seen by many sets of eyes on its journey from drafted definition to official dictionary entry, and no one you work with is going to violate ULEC just because you want your lexical pet to shimmer and shine like a real word. The fact is that all the words in Merriam-Webster's dictionaries are established members of the language, backed by substantial evidence of those words in published, edited text.

    But, um, there was for a time a word in one of our dictionaries that violated this principle. Just one.

    The year was 1934, and Merriam-Webster had just published a brand new dictionary that involved the labor of 250 editors and consultants. Webster's New International Dictionary, Second Edition—or "W2," as we call it—was the largest book to ever be mass produced. It weighed 17 pounds and had a six-inch-wide binding. It was beautiful and astounding. And it included this entry:

    Drop

    No etymology, no examples of the word in use. And there were no examples in our files either. Dord was what's sometimes called a "ghost word."

    How did this happen? It all started when chemistry editor Austin M. Patterson typed up a 3x5 slip of paper and stamped it. Like all such slips, it represented information that would become an entry, and this one was intended to be one of several slips for the letter "D" as an abbreviation—in this case, for "density."

    Drop

    As you can see, there are spaces between the "or" and the D’s on either side. It should have been read as saying something like "capital d and lowercase d are both used as abbreviations for 'density.'" It looks obvious that it's the word or in there, right? Not in this case. Whenever such a slip was made for an entry, spaces were required between all the letters of the term being defined so that there would be room to add any marks showing stress and syllable breaks. To the person who handled the slip, it looked like there was simply a space missing between the "o" and the "r." This person added a wavy line under all four letters, to show that they should all be in boldface, i.e. should be considered a headword. (There should have been a wavy line under each of the D's and a straight line under the "or.") The pronunciation editor penned a simple pronunciation. How it got by final reader, the etymologist, and the proofreader we'll never know. The word was made lowercase because it was understood to be a common noun, and dord as a word for "density" was born.

    In 1939, an editor discovered the error and wrote a slip—marked "imperative! urgent!"—calling for a correction in the next printing.

    Drop

    But for some reason the change wasn't actually made until 1947, after another editor wrote another slip.

    Drop

    And that—finally—marked the end of dord—Merriam-Webster's only "ghost word."

    No prolix apologies were issued, no firings commenced, and no ULEC was revoked. The work of the dictionary simply carried on, as it does.

    Let's block ads! (Why?)

    Read the whole story
    GaryBIshop
    35 days ago
    reply
    Share this story
    Delete

    Saturday Morning Breakfast Cereal - Tolkien

    2 Comments and 10 Shares


    Click here to go see the bonus panel!

    Hovertext:
    If I get up to 4 trillion dollars a month on patreon, I promise to write this novel.


    Today's News:
    Read the whole story
    GaryBIshop
    36 days ago
    reply
    Deep.
    jgbishop
    36 days ago
    I've never read that quote before. If only the library were open again so I could pick up some new stuff to read...
    MaryEllenCG
    36 days ago
    Does your library do ebooks? If so, you don't even need to go in to get some new stuff to read!
    Share this story
    Delete
    1 public comment
    CallMeWilliam
    36 days ago
    reply
    I've been reading a lot of spaceship fiction -- starships that go pewpew, aliens, politics. It's been a good escape. I've found it really hard to read anything more challenging.

    mbmbam

    4 Comments and 11 Shares
    Hello and welcome to Millibar Millibarn Attometer, an advice show for the Planck era.
    Read the whole story
    GaryBIshop
    40 days ago
    reply
    Share this story
    Delete
    4 public comments
    hannahdraper
    39 days ago
    reply
    In a math theory class I took for fun in high school (shut up), one of my friends and I submitted all of our answers in base -3. Our teacher gave us a pass because we did the calculations correctly.
    Washington, DC
    duerig
    39 days ago
    My favorite bases are -1 + i and balanced ternary. There is just something lovely about being able to represent all integers (or complex integers) without the sloppy random 'negative' sign at the front. https://en.wikipedia.org/wiki/Complex-base_system#Base_%E2%88%921_%C2%B1_i https://en.wikipedia.org/wiki/Balanced_ternary
    pavel_lishin
    40 days ago
    reply
    My dad used to force his students to convert their answers into arbitrary units, just to make sure they understood both the problem, and conversions.
    New York, NY
    cjheinz
    40 days ago
    reply
    "Units are weird".
    Dimensional analysis is one of the best multitools in Physics.
    alt_text_bot
    40 days ago
    reply
    Hello and welcome to Millibar Millibarn Attometer, an advice show for the Planck era.
    Next Page of Stories