duckdb -c "FROM 'hf://datasets/wikimedia/structured-wikipedia/enwiki/data/*.parquet' LIMIT 5". The English version has 35 GB, 7.6 million articles, and you're better off downloading it rather than running analyses remotely. #database#databaseusql queries MySQL, Postgres, SQLite, MSSQL, Oracle, etc via a single interface. For example, usql 'mysql://rfamro:@mysql-rfam-public.ebi.ac.uk:4497/Rfam' -c "SELECT * FROM clan limit 3;". But DuckDB is more versatile, IMHO. #databaseINSTALL mysql; LOAD mysql;
ATTACH 'host=mysql-rfam-public.ebi.ac.uk port=4497 user=rfamro database=Rfam' AS rfam (TYPE mysql);
SELECT * from rfam.Rfam.clan LIMIT 3;
SELECT * FROM 'file.xlsx' LIMIT 3;
SELECT * FROM 'file.csv' LIMIT 3;
#database#database #github#database #future #llm-ops #markdown #models#dev #database #future #markdown#database #markdown#cloud #database #github #markdown#cloud #database#databaseSIGSEGV (Address boundary error) when connecting to SQLite databases. #dev #database #markdown#database #markdownsqlite3 'file:places.sqlite?mode=ro&nolock=1'. datasette uses this. For example, to read the Edge history on Linux, use datasette ~/.config/microsoft-edge/Default/History --nolock Ref #database#database#databaseWITH
sessions AS (FROM events SELECT COUNT(DISTINCT session_id) AS value),
pages AS (FROM events SELECT COUNT(*) AS value)
FROM sessions, pages
SELECT sessions.value / pages.value AS pages_per_session;
#database#database#database #llm-ops#cloud #database#database#dev #database #llm-ops #todo#database #llm-ops#database