4.2. Create a SQLite database from CSV/Excel/JSON/SQLite/etc. files

sqlitebiter file is a subcommand to convert tabular data file(s) to a SQLite database file.

4.2.1. Examples

Using wildcard to convert multiple files. File formats are automatically detected from the extensions.

Example:
$ ls
sample_data.csv  sample_data.xlsx  sample_data_multi.json  sample_data_single.json
$ sqlitebiter -o sample.sqlite file *
[INFO] sqlitebiter file: convert 'sample_data.csv' to 'sample_data' table
[INFO] sqlitebiter file: convert 'sample_data_multi.json' to 'table_a' table
[INFO] sqlitebiter file: convert 'sample_data_multi.json' to 'table_b' table
[INFO] sqlitebiter file: convert 'sample_data_single.json' to 'sample_data_single' table
[INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet1' table
[INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet3' table
[INFO] sqlitebiter file: converted results: source=4, success=6, created-table=6
[INFO] sqlitebiter file: database path: sample.sqlite
Output:
$ sqlite3 sample.sqlite .schema
CREATE TABLE IF NOT EXISTS '_source_info_' ("source_id" INTEGER NOT NULL, "dir_name" TEXT, "base_name" TEXT NOT NULL, "format_name" TEXT NOT NULL, "dst_table" TEXT NOT NULL, size INTEGER, mtime INTEGER);
CREATE TABLE IF NOT EXISTS 'sample_data' ("attr_a" INTEGER, "attr_b" REAL, "attr_c" TEXT);
CREATE TABLE IF NOT EXISTS 'table_a' ("attr_a" INTEGER, "attr_b" REAL, "attr_c" TEXT);
CREATE TABLE IF NOT EXISTS 'table_b' (a INTEGER, b REAL);
CREATE TABLE IF NOT EXISTS 'sample_data_single' ("attr_a" INTEGER, "attr_b" REAL, "attr_c" TEXT);
CREATE TABLE IF NOT EXISTS 'samplesheet1' (a INTEGER, b REAL, c TEXT);
CREATE TABLE IF NOT EXISTS 'samplesheet3' (aa INTEGER, ab TEXT, ac TEXT);

Designate multiple file path to convert:

Example:
$ sqlitebiter file sample_data.csv sample_data.xlsx
[INFO] sqlitebiter file: convert 'sample_data.csv' to 'sample_data' table
[INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet1' table
[INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet3' table
[INFO] sqlitebiter file: converted results: source=2, success=3, created-table=3
[INFO] sqlitebiter file: database path: out.sqlite

Note

  • Available JSON Schema is limited. Acceptable format described in here
  • Wildcard characters cannot use in Windows environments

You could specify converting file format with the --format option

Example:
$ sqlitebiter file --format csv sample_data
[INFO] sqlitebiter file: convert 'sample_data.csv' to 'sample_data' table
[INFO] sqlitebiter file: converted results: source=1, success=1, created-table=1
[INFO] sqlitebiter file: database path: out.sqlite

4.2.2. sqlitebiter file subcommand help

Usage: sqlitebiter file [OPTIONS] [FILES]...

  Convert tabular data within CSV/Excel/HTML/JSON/Jupyter
  Notebook/LDJSON/LTSV/Markdown/Mediawiki/SQLite/SSV/TSV file(s) or named
  pipes to a SQLite database file.

Options:
  -r, --recursive                 Read all files under each directory,
                                  recursively.
  --pattern PATTERN               Convert files matching PATTERN.
  --exclude PATTERN               Exclude files matching PATTERN.
  --follow-symlinks               Follow symlinks.
  -f, --format [csv|excel|html|json|json_lines|jsonl|ldjson|ltsv|markdown|mediawiki|ndjson|sqlite|ssv|tsv|ipynb]
                                  Data format to loading (auto-detect from
                                  file extensions in default).
  --encoding ENCODING             Encoding to load files. Auto-detection from
                                  files in default.
  -h, --help                      Show this message and exit.

  Documentation: https://sqlitebiter.rtfd.io/ Issue tracker:
  https://github.com/thombashi/sqlitebiter/issues

4.2.3. Supported data formats

Following table shows that the supported data formats:

Available data formats
Format File Extension Remarks
CSV .csv  
Excel .xlsx/.xls Create table for each sheet in the Excel workbook.
HTML .html/.htm
Scrape tabular data from <table> tags in the HTML file.
And create table for each <table> tag data.
JSON .json  
Jupyter Notebook .ipynb  
Line-delimited JSON .jsonl/.ldjson/.ndjson  
LTSV .ltsv  
Markdown .md
Extract tabular data in the Markdown file.
And create a table for each <table> tabular data.
SQLite .sqlite/.sqlite3  
TSV .tsv  

4.2.4. Table naming convention

Table name automatically decided as follows for each format:

Format Table Name
CSV <filename>
Excel <Sheet name>
HTML
<title>_<key>. <title> replaced with the title tag of the page.
<key> replaced with: (1) id attribute of the table tag.
(2) unique string if id attribute not present in the table tag.
JSON  
LTSV <filename>
Markdown <filename>
TSV <filename>
  • Common behavior
    • <filename> replaced with filename of converting file (without extensions)
      • e.g. If the input file name is sample.csv, <filename> is sample

If a created table name already exists in the database, the behavior differs depending on the existing table (after this referred to as A) and create table (after this referred to as B) structure:

  1. A and B has the same table name and table structure
    • Append creating table data to the existing table data
  2. A and B has the same table name, but different table structure
    • sqlitebiter try to create unique table name for B by appending suffix id number