4.2. Create a SQLite database from CSV/Excel/JSON/SQLite/etc. files¶
sqlitebiter file
is a subcommand to convert tabular data file(s) to a SQLite database file.
4.2.1. Examples¶
You can use wildcards to convert multiple files into a database file. glob?
File formats are automatically detected from the extensions.
- Example:
$ ls sample_data.csv sample_data.xlsx sample_data_multi.json sample_data_single.json $ sqlitebiter -o sample.sqlite file * [INFO] sqlitebiter file: convert 'sample_data.csv' to 'sample_data' table [INFO] sqlitebiter file: convert 'sample_data_multi.json' to 'table_a' table [INFO] sqlitebiter file: convert 'sample_data_multi.json' to 'table_b' table [INFO] sqlitebiter file: convert 'sample_data_single.json' to 'sample_data_single' table [INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet1' table [INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet3' table [INFO] sqlitebiter file: converted results: source=4, success=6, created-table=6 [INFO] sqlitebiter file: database path: sample.sqlite
- Output:
$ sqlite3 sample.sqlite .schema CREATE TABLE IF NOT EXISTS '_source_info_' ("source_id" INTEGER NOT NULL, "dir_name" TEXT, "base_name" TEXT NOT NULL, "format_name" TEXT NOT NULL, "dst_table" TEXT NOT NULL, size INTEGER, mtime INTEGER); CREATE TABLE IF NOT EXISTS 'sample_data' ("attr_a" INTEGER, "attr_b" REAL, "attr_c" TEXT); CREATE TABLE IF NOT EXISTS 'table_a' ("attr_a" INTEGER, "attr_b" REAL, "attr_c" TEXT); CREATE TABLE IF NOT EXISTS 'table_b' (a INTEGER, b REAL); CREATE TABLE IF NOT EXISTS 'sample_data_single' ("attr_a" INTEGER, "attr_b" REAL, "attr_c" TEXT); CREATE TABLE IF NOT EXISTS 'samplesheet1' (a INTEGER, b REAL, c TEXT); CREATE TABLE IF NOT EXISTS 'samplesheet3' (aa INTEGER, ab TEXT, ac TEXT);
Designate multiple file paths to convert:
- Example:
$ sqlitebiter file sample_data.csv sample_data.xlsx [INFO] sqlitebiter file: convert 'sample_data.csv' to 'sample_data' table [INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet1' table [INFO] sqlitebiter file: convert 'sample_data.xlsx' to 'samplesheet3' table [INFO] sqlitebiter file: converted results: source=2, success=3, created-table=3 [INFO] sqlitebiter file: database path: out.sqlite
Note
See also: Limitations of JSON conversion
Wildcard characters cannot use in Windows environments
You could explicitly specify converting file format with the --format
option
- Example:
$ sqlitebiter file --format csv sample_data [INFO] sqlitebiter file: convert 'sample_data.csv' to 'sample_data' table [INFO] sqlitebiter file: converted results: source=1, success=1, created-table=1 [INFO] sqlitebiter file: database path: out.sqlite
4.2.2. sqlitebiter file
subcommand help¶
Usage: sqlitebiter file [OPTIONS] [FILES]...
Convert tabular data within CSV/Excel/HTML/JSON/Jupyter
Notebook/LDJSON/LTSV/Markdown/Mediawiki/SQLite/SSV/TSV file(s) or named
pipes to a SQLite database file.
Options:
-r, --recursive Read all files under each directory,
recursively.
--pattern PATTERN Convert files matching PATTERN.
--exclude PATTERN Exclude files matching PATTERN.
--follow-symlinks Follow symlinks.
-f, --format [csv|excel|html|json|json_lines|jsonl|ldjson|ltsv|markdown|mediawiki|ndjson|sqlite|ssv|tsv|ipynb]
Data format to loading (auto-detect from
file extensions in default).
--encoding ENCODING Encoding to load files. Auto-detection from
files in default.
-h, --help Show this message and exit.
Documentation: https://sqlitebiter.rtfd.io/ Issue tracker:
https://github.com/thombashi/sqlitebiter/issues
4.2.3. Supported data formats¶
The following table shows the supported data formats:
Format |
File Extension |
Remarks |
---|---|---|
CSV |
|
|
Excel |
|
Create a table for each sheet in the Excel workbook. |
HTML |
|
Scrape tabular data from
<table> tags in the HTML file.And create table for each
<table> tag data. |
JSON |
|
|
Jupyter Notebook |
|
|
Line-delimited JSON |
|
|
LTSV |
|
|
Markdown |
|
Extract tabular data in the Markdown file.
And create a table for each
<table> tabular data. |
SQLite |
|
|
TSV |
|
4.2.4. Table naming convention¶
Table name automatically decided as follows for each format:
Format
Table Name
CSV
<filename>
Excel
<Sheet name>
HTML
<title>_<key>
.<title>
replaced with thetitle
tag of the page.<key>
replaced with: (1)id
attribute of the table tag.(2) unique string ifid
attribute not present in the table tag.JSON
LTSV
<filename>
Markdown
<filename>
TSV
<filename>
- Common behavior
<filename>
replaced with the filename of converting file (without extensions)e.g. If the input file name is
sample.csv
,<filename>
issample
If a created table name already exists in the database, the behavior differs depending on
the existing table (after this referred to as A
) and create table (after this referred to as B
) structure:
A
andB
has the same table name and table structureAppend creating table data to the existing table data
A
andB
has the same table name, but the different table structuresqlitebiter
try to create unique table name forB
by appending suffix id number