4.2. Create a SQLite database from CSV/Excel/JSON/SQLite/etc. files

sqlitebiter file is a subcommand to convert tabular data file(s) to a SQLite database file.

4.2.1. Examples

You can use wildcards to convert multiple files into a database file. glob?

File formats are automatically detected from the extensions.

Example:
$ ls
sample_data.csv  sample_data.xlsx  sample_data_multi.json  sample_data_single.json
$ sqlitebiter -o sample.sqlite file *
[INFO] sqlitebiter file: convert 'sample_data.csv' to 'sample_data' table
[INFO] sqlitebiter file: convert 'sample_data_multi.json' to 'table_a' table
[INFO] sqlitebiter file: convert 'sample_data_multi.json' to 'table_b' table
[INFO] sqlitebiter file: convert 'sample_data_single.json' to 'sample_data_single' table
[INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet1' table
[INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet3' table
[INFO] sqlitebiter file: converted results: source=4, success=6, created-table=6
[INFO] sqlitebiter file: database path: sample.sqlite
Output:
$ sqlite3 sample.sqlite .schema
CREATE TABLE IF NOT EXISTS '_source_info_' ("source_id" INTEGER NOT NULL, "dir_name" TEXT, "base_name" TEXT NOT NULL, "format_name" TEXT NOT NULL, "dst_table" TEXT NOT NULL, size INTEGER, mtime INTEGER);
CREATE TABLE IF NOT EXISTS 'sample_data' ("attr_a" INTEGER, "attr_b" REAL, "attr_c" TEXT);
CREATE TABLE IF NOT EXISTS 'table_a' ("attr_a" INTEGER, "attr_b" REAL, "attr_c" TEXT);
CREATE TABLE IF NOT EXISTS 'table_b' (a INTEGER, b REAL);
CREATE TABLE IF NOT EXISTS 'sample_data_single' ("attr_a" INTEGER, "attr_b" REAL, "attr_c" TEXT);
CREATE TABLE IF NOT EXISTS 'samplesheet1' (a INTEGER, b REAL, c TEXT);
CREATE TABLE IF NOT EXISTS 'samplesheet3' (aa INTEGER, ab TEXT, ac TEXT);

Designate multiple file paths to convert:

Example:
$ sqlitebiter file sample_data.csv sample_data.xlsx
[INFO] sqlitebiter file: convert 'sample_data.csv' to 'sample_data' table
[INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet1' table
[INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet3' table
[INFO] sqlitebiter file: converted results: source=2, success=3, created-table=3
[INFO] sqlitebiter file: database path: out.sqlite

Note

You could explicitly specify converting file format with the --format option

Example:
$ sqlitebiter file --format csv sample_data
[INFO] sqlitebiter file: convert 'sample_data.csv' to 'sample_data' table
[INFO] sqlitebiter file: converted results: source=1, success=1, created-table=1
[INFO] sqlitebiter file: database path: out.sqlite

4.2.2. sqlitebiter file subcommand help

Usage: sqlitebiter file [OPTIONS] [FILES]...

  Convert tabular data within CSV/Excel/HTML/JSON/Jupyter
  Notebook/LDJSON/LTSV/Markdown/Mediawiki/SQLite/SSV/TSV file(s) or named
  pipes to a SQLite database file.

Options:
  -r, --recursive                 Read all files under each directory,
                                  recursively.
  --pattern PATTERN               Convert files matching PATTERN.
  --exclude PATTERN               Exclude files matching PATTERN.
  --follow-symlinks               Follow symlinks.
  -f, --format [csv|excel|html|json|json_lines|jsonl|ldjson|ltsv|markdown|mediawiki|ndjson|sqlite|ssv|tsv|ipynb]
                                  Data format to loading (auto-detect from
                                  file extensions in default).
  --encoding ENCODING             Encoding to load files. Auto-detection from
                                  files in default.
  -h, --help                      Show this message and exit.

  Documentation: https://sqlitebiter.rtfd.io/ Issue tracker:
  https://github.com/thombashi/sqlitebiter/issues

4.2.3. Supported data formats

The following table shows the supported data formats:

Available data formats

Format

File Extension

Remarks

CSV

.csv

Excel

.xlsx/.xls

Create a table for each sheet in the Excel workbook.

HTML

.html/.htm

Scrape tabular data from <table> tags in the HTML file.
And create table for each <table> tag data.

JSON

.json

Jupyter Notebook

.ipynb

Line-delimited JSON

.jsonl/.ldjson/.ndjson

LTSV

.ltsv

Markdown

.md

Extract tabular data in the Markdown file.
And create a table for each <table> tabular data.

SQLite

.sqlite/.sqlite3

TSV

.tsv

4.2.4. Table naming convention

Table name automatically decided as follows for each format:

Format

Table Name

CSV

<filename>

Excel

<Sheet name>

HTML

<title>_<key>. <title> replaced with the title tag of the page.
<key> replaced with: (1) id attribute of the table tag.
(2) unique string if id attribute not present in the table tag.

JSON

LTSV

<filename>

Markdown

<filename>

TSV

<filename>

  • Common behavior
    • <filename> replaced with the filename of converting file (without extensions)
      • e.g. If the input file name is sample.csv, <filename> is sample

If a created table name already exists in the database, the behavior differs depending on the existing table (after this referred to as A) and create table (after this referred to as B) structure:

  1. A and B has the same table name and table structure
    • Append creating table data to the existing table data

  2. A and B has the same table name, but the different table structure
    • sqlitebiter try to create unique table name for B by appending suffix id number