psql commands look like cryptic backslash incantations until the day they become the single most productive surface in the entire PostgreSQL stack. Senior data engineers live in psql — not pgAdmin, not DBeaver — because psql is a real REPL with auto-completion, query history, transactions, scripting, and a bulk-load primitive that streams gigabytes over the wire without ever touching the server filesystem.
This reference walks through the four jobs psql does better than any GUI — inspection (\d+), bulk-load (\copy), connection (.pgpass + pg_service.conf + IAM), and scripting (\set, \gset, \if, ON_ERROR_STOP) — with the dialect-by-dialect comparison of COPY vs \copy, the four sslmode levels, and the CI flags every migration runner depends on. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, then a concept-by-concept breakdown of why it works.
When you want hands-on reps immediately after reading, drill the database practice library →, rehearse on SQL practice problems →, and stack the bulk-load muscles with ETL practice drills →.
On this page
- Why psql is the data engineer's secret weapon
- The top 30 meta-commands every PG engineer should memorise
- COPY vs \copy — the bulk-load showdown
- Connection strings, .pgpass, and the auth flow
- Scripting psql — variables, conditionals, and CI-friendly runs
- Cheat sheet — psql power-user recipes
- Frequently asked questions
- Practice on PipeCode
1. Why psql is the data engineer's secret weapon
psql is a thin client over libpq — every meta-command runs locally, every SQL statement crosses the wire
The one-sentence invariant: psql commands prefixed with a backslash (\dt, \copy, \set, \i) are evaluated entirely on the client side and never reach the PostgreSQL server — everything else is a SQL statement sent over the libpq protocol. Once you internalise that split, the entire surface of psql becomes a clean mental model: the backslash universe is your scripting host, the SQL universe is the database.
Why every senior PG engineer lives in psql, not pgAdmin.
-
Speed. A GUI takes three clicks and 600 ms to expand a table's column list.
\d+ ordersreturns the same information in 30 ms with table size, index list, and row estimate included. -
Scriptability. GUIs do not version-control well. A psql session with
\i migrate.sqlcan be replayed in CI bit-for-bit identically. The CI flag-v ON_ERROR_STOP=1makes any failing statement abort the whole run. -
Honesty. GUIs hide the wire-protocol details that matter —
search_path, current role, SSL mode, connection pooler in front. psql shows you all of them via\conninfo. -
Composability. psql plays well with shell pipes:
psql -c "SELECT ..." | jq,pg_dump | psql -h replica,psql -f migrate.sql | tee migrate.log.
The three superpowers in one sentence each.
-
Introspection —
\d+ tablereturns columns, types, indexes, sizes, and statistics in one screen. Replaces three GUI panels. -
Bulk-load —
\copy table FROM 'file.csv' WITH (FORMAT CSV, HEADER)streams a file from the client machine to the database over the wire. No filesystem access required on the server. -
Scripting —
\set,\gset,\if,\elif,\else,\endifturn psql into a real conditional script host that runs the same on a laptop as in CI.
Where psql fits in a modern stack.
-
CI scripts. Every dbt-style or Flyway-style migration framework eventually shells out to
psql -v ON_ERROR_STOP=1 -f file.sql. Knowing the CI flags pays for itself the first time a migration fails halfway through. -
Ad-hoc analysis. Senior DEs reach for psql before notebooks for "answer this question in the next five minutes" work — auto-complete + history +
\timing onbeats every other surface. -
Bulk-loading. Loading a 50 GB CSV via the application layer is a recipe for a five-hour ingest.
\copypluspvfor a progress bar plus a quickSET synchronous_commit = offruns the same load in 18 minutes.
The 2026 reality.
-
\copyis the only bulk-load primitive you should reach for in a cloud-hosted database — RDS, Aurora, Cloud SQL, Azure Database for PostgreSQL all forbid server-sideCOPY ... FROM '/path'because the client has no filesystem on the DB host. -
.pg_service.confis the modern way to name connection profiles.psql service=prod-warehouseis shorter than retyping a 200-character connection string and survives a password rotation by editing one file. -
\gsetis criminally underused — capturing a query result into a psql variable is how you write idempotent migrations and CI scripts that branch on real data. -
SET synchronous_commit = OFFbefore a bulk load can cut wall-clock time by 30 to 60 percent on RDS without sacrificing crash-recovery correctness.
Worked example — psql in the wild, six commands that replace a GUI session
Detailed explanation. A new DE inherits a Postgres database with 30 schemas and 4,000 tables. The task is "find the largest table in the analytics schema, look at its columns, and check its indexes." A GUI requires a tree expansion, a properties dialog, and a separate index view. psql does the whole job in six commands.
Question. Walk through the six-command psql session that lists schemas, switches into one, lists tables ranked by size, and inspects the biggest table's columns and indexes.
Input. The terminal connected to the database with psql -h db.prod -U analytics -d warehouse.
Code.
warehouse=> \dn
List of schemas
Name | Owner
----------+----------------
public | postgres
analytics| analytics_owner
warehouse=> \dt analytics.*
List of relations
Schema | Name | Type | Owner
-----------+---------------+-------+----------
analytics | events | table | analytics_owner
analytics | orders | table | analytics_owner
analytics | sessions_daily| table | analytics_owner
warehouse=> SELECT relname, pg_size_pretty(pg_total_relation_size('analytics.' || relname)) AS size
-> FROM pg_stat_user_tables WHERE schemaname = 'analytics'
-> ORDER BY pg_total_relation_size('analytics.' || relname) DESC LIMIT 5;
relname | size
-----------------+--------
events | 412 GB
orders | 58 GB
sessions_daily | 12 GB
warehouse=> \d+ analytics.events
Table "analytics.events"
Column | Type | Nullable | Default | Storage | Description
------------+---------------+----------+---------+---------+----------------
event_id | bigint | not null | | plain | surrogate key
user_id | bigint | | | plain |
event_time | timestamptz | not null | | plain |
payload | jsonb | | | extended|
Indexes:
"events_pkey" PRIMARY KEY, btree (event_id)
"events_user_id_idx" btree (user_id)
"events_event_time_idx" btree (event_time)
Partitions: events_2026_01 ...
warehouse=> \timing on
Timing is on.
warehouse=> SELECT count(*) FROM analytics.events WHERE event_time >= NOW() - INTERVAL '1 day';
count
-----------
4218394
Time: 412.318 ms
Step-by-step explanation.
-
\dnlists schemas — the equivalent of expanding the database node in a GUI tree. -
\dt analytics.*lists tables in that schema. The asterisk is psql glob —\dt analytics.events*would filter to a prefix. - A SQL query against
pg_stat_user_tablesranks tables by total size (including indexes and TOAST). This is impossible in most GUIs without third-party tooling. -
\d+ analytics.eventsprints columns, types, nullability, defaults, storage type, descriptions, indexes, and any partitions — the GUI-equivalent of three panels at once. -
\timing onenables wall-clock measurement of every subsequent query. The DE now knows the COUNT scanned 4.2 M rows in 412 ms — a useful signal for whether the index is being used. - The session is repeatable: pipe these into a
.sqlfile, run withpsql -f, and you have a reproducible inspection script.
Output. Same data a GUI would produce, plus query timing — all in plain text that can be copy-pasted into a ticket.
Rule of thumb. Any time you reach for a GUI to inspect a Postgres database, ask "is there a \d for this?" — the answer is almost always yes, and psql will be twice as fast.
Worked example — psql vs sql client SDK, the wire-protocol perspective
Detailed explanation. Junior engineers sometimes argue that psycopg2 or the Node pg library is "the same thing" as psql. It is not. psql is libpq plus a backslash-prefixed REPL that runs the meta-commands locally. SDKs are libpq without the REPL — so they cannot execute \dt or \copy because those are client-side constructs, not SQL.
Question. Show why \dt works in psql but raises a syntax error if you send it through psycopg2's cursor.execute(). Translate \dt analytics.* into the equivalent SQL against pg_catalog.
Input. A connected psql session and a connected psycopg2 connection.
Code.
# WRONG — \dt is psql-only; the server never sees it
import psycopg2
conn = psycopg2.connect("dbname=warehouse user=analytics")
cur = conn.cursor()
cur.execute("\\dt analytics.*")
# raises: syntax error at or near "\"
# RIGHT — execute the SQL that \dt expands to
cur.execute("""
SELECT n.nspname AS schema, c.relname AS table_name
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r' AND n.nspname = 'analytics'
ORDER BY c.relname;
""")
print(cur.fetchall())
Step-by-step explanation.
- psql intercepts every line beginning with a backslash.
\dtis rewritten on the client into a query againstpg_classandpg_namespace, then the result is formatted with psql's table renderer. - psycopg2 sends raw bytes to the server. The PostgreSQL parser sees
\dt analytics.*and raises a syntax error — there is no SQL keyword\dt. - The "right" version explicitly queries the catalog tables psql would have queried. This is exactly the SQL psql sends when you type
\dt analytics.*. - To see the SQL psql would generate, start psql with
-E(echo) — every meta-command prints the underlying SQL before executing. Invaluable for porting psql habits into SDK code.
Output. The Python SDK version returns the same row list, but you wrote the SQL yourself. The psql version is six characters; the SDK version is twelve lines.
Rule of thumb. When automation needs the same data a meta-command shows, run psql with -E once to capture the underlying SQL, then embed that SQL in your SDK code. Never try to ship a backslash command across the wire — it cannot work.
Worked example — repeatable ad-hoc analysis with .psqlrc
Detailed explanation. Every senior PG engineer keeps a ~/.psqlrc file with their preferred defaults — \timing on, prompt format, error verbosity, output null marker. New tabs open with the same setup every time. This is the analyst equivalent of dotfiles.
Question. Write a .psqlrc that enables timing, sets NULL to display as the literal ␀ (visible), enables verbose error output, sets a multi-line continuation prompt, and aborts the session on error in script mode.
Input. The file ~/.psqlrc (read on every psql startup).
Code.
-- ~/.psqlrc
\set QUIET 1 -- silent until the prompt
\timing on -- wall-clock for every query
\pset null '␀' -- visible NULL marker
\set VERBOSITY verbose -- full error detail
\set HISTSIZE 100000 -- big query history
\set PROMPT1 '%[%033[1;33m%]%n@%/%R%#%[%033[0m%] '
\set PROMPT2 ' ... > '
\set ON_ERROR_STOP on -- abort scripts on first error
\set COMP_KEYWORD_CASE upper
\unset QUIET
\echo 'psql session started — \\timing on, NULL=␀, VERBOSITY=verbose'
Step-by-step explanation.
-
\set QUIET 1silences the welcome banner while.psqlrcruns.\unset QUIETat the end restores normal verbosity for the user session. -
\timing onmakes every query report wall-clock time. The very first time you forget to add this and run a slow query, you will wish it had been on. -
\pset null '␀'replaces invisible NULLs with a visible Unicode codepoint, so you can see the difference between empty string and NULL at a glance. -
\set ON_ERROR_STOP onis global — every script run from this session, including via\i file.sql, will abort on the first error. The CLI flag-v ON_ERROR_STOP=1is what scripts use; this is the interactive equivalent. - The prompt formats include the user, the current database (
%/), and the transaction state marker (%R%#) — so you always know whether you are mid-transaction in a CTE-heavy session.
Output. Every new psql tab starts with timing on, errors verbose, NULLs visible, and a colourful prompt — zero per-session muscle memory required.
Rule of thumb. A six-line ~/.psqlrc saves a senior DE ten minutes a day. Commit it to your dotfiles repo and pair-program with future-you.
PostgreSQL interview question on the psql vs server boundary
A senior interviewer often opens with: "Explain what happens when you type \dt in psql versus typing SELECT 1 — which one crosses the wire, what does the server see, and how would you prove it?" This blends the libpq protocol mental model with practical debugging.
Solution Using psql -E to expose the underlying SQL
# Start psql with -E to echo every meta-command's underlying SQL
psql -E -h db.prod -U analytics -d warehouse
warehouse=> \dt analytics.*
********* QUERY **********
SELECT n.nspname as "Schema",
c.relname as "Name",
CASE c.relkind WHEN 'r' THEN 'table' END as "Type",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','p','')
AND n.nspname OPERATOR(pg_catalog.~) '^(analytics)$' COLLATE pg_catalog.default
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
**************************
List of relations
Schema | Name | Type | Owner
-----------+---------------+-------+----------
analytics | events | table | analytics_owner
analytics | orders | table | analytics_owner
Step-by-step trace.
| Command typed | What psql does | What the server sees |
|---|---|---|
\dt analytics.* |
rewrites into a pg_class / pg_namespace query |
the rewritten SQL above |
SELECT 1 |
sends literally | SELECT 1 |
\timing on |
flips a client flag | nothing — server never notified |
\copy t FROM 'f.csv' CSV HEADER |
opens local file, issues COPY t FROM STDIN CSV HEADER, streams bytes |
COPY t FROM STDIN CSV HEADER plus the byte stream |
\set var 'value' |
stores in client memory | nothing |
The -E (echo) flag is the canonical proof — every meta-command prints its expansion before executing, so you can verify exactly what the server sees.
Output:
| Surface | Crosses wire? | Server sees |
|---|---|---|
| Backslash meta-commands | yes, as SQL or COPY | the expanded SQL |
Pure backslash settings (\timing, \pset, \set) |
no | nothing |
| Bare SQL | yes | the SQL as typed |
\copy |
yes |
COPY ... FROM STDIN and the byte stream |
Why this works — concept by concept:
- Client / server boundary — psql is a libpq client. Anything beginning with a backslash is a client-side instruction; the server only ever receives SQL strings (or COPY byte streams).
-
psql -E echoes meta-commands — debugging trick to expose the underlying SQL; pair it with
\timing onto also see how the catalog query performs. -
pg_class + pg_namespace — the catalog tables
\dtand\dquery under the hood. Every introspection meta-command can be reproduced as a query againstpg_catalog. -
\copy is special — it does cross the wire (the bytes go to the server's
COPY FROM STDIN), but the file open happens on the client. That is what makes it the right tool for remote databases. - Cost — meta-command rewrites are constant-cost on the client; the SQL expansion costs the same as any catalog query. No special server load.
SQL
Topic — database
PostgreSQL database problems (SQL)
2. The top 30 meta-commands every PG engineer should memorise
psql commands map to five jobs — meta / help, navigation, inspection, copy / output, scripting — master these and you live in psql instead of a GUI
The mental model in one line: psql's 30 most-used backslash commands cluster into five categories, each replacing what a GUI does in three to five clicks. Once you know the five categories cold, every other meta-command becomes a \? away from being memorised.
The five-jobs model.
| Job | Commands | What it replaces |
|---|---|---|
| Meta / help |
\?, \h, \q, \conninfo, \encoding, \!
|
the GUI "About this connection" dialog |
| Navigation |
\c, \l, \dn, \dt, \dv, \du
|
the tree expansion in the GUI sidebar |
| Inspection |
\d, \d+, \sf, \sv, \dx, \df
|
the table / view / function "properties" dialog |
| Copy / output |
\x, \timing, \copy, \pset, \H
|
export-to-CSV, query-timing, expanded-display |
| Scripting |
\i, \ir, \set, \gset, \if, \echo
|
the SQL editor's "run script" button + variable substitution |
The 30 commands in five tiers.
-
Tier 1 — quit, help, connect.
\qquit;\?meta-command help;\hSQL syntax help;\conninfocurrent connection details;\c dbnameswitch database;\c "host=... user=..."reconnect with new connection string;\! cmdshell escape. -
Tier 2 — list everything.
\llist databases;\dnlist schemas;\dtlist tables;\dvlist views;\dflist functions;\dulist roles;\dxlist extensions;\dlist everything in the current schema. -
Tier 3 — describe.
\d tablecolumns + indexes;\d+ tableeverything + sizes + descriptions;\sf funcfunction source;\sv viewview source;\dp tableprivileges. -
Tier 4 — output and timing.
\xtoggle expanded display (vertical);\timing onwall-clock per query;\pset format csv|aligned|html|jsonswitch output format;\pset null '␀'how NULLs display;\HHTML output mode;\o filesend output to file. -
Tier 5 — scripting.
\i file.sqlinclude a SQL file;\ir file.sqlinclude relative to current file;\set var valueset client variable;\gsetcapture query result into vars;\if :cond ... \elif ... \else ... \endifbranching;\echo :varprint to stderr;\copy table FROM 'f.csv' CSV HEADERclient-side bulk-load.
Two patterns that matter.
-
The
\dfamily glob pattern.\dt schema.prefix*lists every table inschemawhose name starts withprefix. Works for\dv,\df,\d,\dx— anywhere you can pass a name. -
The
+suffix.\d+,\l+,\dn+,\dt+all add "extra" columns — sizes, descriptions, owner info. Default to the+form when investigating.
Worked example — investigate a slow query end-to-end with only meta-commands
Detailed explanation. A nightly ETL job degraded from 4 minutes to 28 minutes. The root cause investigation needs to identify the slow query, check its plan, and verify whether the right index exists. The entire workflow fits in psql.
Question. Walk through the eight-command psql session that uses pg_stat_statements, EXPLAIN, \d+, and \timing to find a slow query, examine its plan, confirm the missing index, and verify the fix.
Input. The terminal connected to the database with pg_stat_statements enabled.
Code.
warehouse=> \dx pg_stat_statements
List of installed extensions
Name | Version | Schema | Description
--------------------+---------+--------+----------------------------------------
pg_stat_statements | 1.10 | public | track planning and execution statistics
warehouse=> SELECT query, calls, total_exec_time, mean_exec_time
-> FROM pg_stat_statements
-> ORDER BY total_exec_time DESC LIMIT 3;
query | calls | total_exec_time | mean_exec_time
---------------------------------------------------------+-------+-----------------+----------------
SELECT * FROM orders WHERE customer_id = $1 | 240 | 1680000.5 | 7000.00
SELECT * FROM events WHERE event_time >= $1 | 18 | 144320.0 | 8017.78
SELECT count(*) FROM sessions_daily | 480 | 12000.0 | 25.00
warehouse=> \timing on
Timing is on.
warehouse=> EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on orders (cost=0.00..285000.00 rows=12 width=88) (actual time=42.3..7012.1 rows=12 loops=1)
Filter: (customer_id = 123)
Rows Removed by Filter: 8000000
Planning Time: 0.182 ms
Execution Time: 7012.5 ms
Time: 7013.041 ms
warehouse=> \d+ orders
Table "public.orders"
Column | Type | Nullable | ...
-------------+----------+----------+----
order_id | bigint | not null |
customer_id | bigint | |
...
Indexes:
"orders_pkey" PRIMARY KEY, btree (order_id)
"orders_order_date_idx" btree (order_date)
warehouse=> CREATE INDEX CONCURRENTLY orders_customer_id_idx ON orders(customer_id);
CREATE INDEX
Time: 41028.7 ms
warehouse=> EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Index Scan using orders_customer_id_idx on orders (cost=0.43..38.92 rows=12) (actual time=0.041..0.18 rows=12 loops=1)
Index Cond: (customer_id = 123)
Planning Time: 0.122 ms
Execution Time: 0.231 ms
Time: 0.412 ms
Step-by-step explanation.
-
\dx pg_stat_statementsconfirms the extension is installed. If it were not,CREATE EXTENSION pg_stat_statements;plus a config change would be needed. - The query against
pg_stat_statementsranks queries by total time. The top offender accounts for 1.68 M ms (28 minutes) — matches the ETL slowdown. -
\timing onmakes the nextEXPLAIN ANALYZEprint wall-clock time alongside the planner's estimate. -
EXPLAIN ANALYZEconfirms the sequential scan filter on 8 M rows. 7 seconds per query × 240 calls = 28 minutes total — matches. -
\d+ orderslists indexes — there is no index oncustomer_id. Confirmed missing. -
CREATE INDEX CONCURRENTLYbuilds the missing index without locking out writes. Takes 41 seconds. - Re-run
EXPLAIN ANALYZE— now an Index Scan at 0.231 ms. 7000× faster. - Replay the ETL: should run in 4 minutes again.
Output. Five lines of psql produced a 30,000× per-query speedup. No GUI involved.
Rule of thumb. When a query slows down, the diagnostic path is always pg_stat_statements → EXPLAIN ANALYZE → \d+ → fix → re-run EXPLAIN ANALYZE. Memorise the four steps and the four meta-commands; you will use them weekly.
Worked example — \dt glob patterns vs catalog queries
Detailed explanation. \dt accepts a glob pattern that matches schema and table names. The glob is identical to the LIKE patterns used by pg_catalog queries, with * as wildcard. Knowing both lets you pick the right tool — glob for interactive use, catalog query for scripts.
Question. Show the four equivalent ways to list every table in the analytics schema whose name starts with events_: glob, regex, catalog query, and information_schema query.
Input. A connected psql session.
Code.
-- 1) psql glob (interactive)
warehouse=> \dt analytics.events_*
-- 2) psql with regex via ~ operator (less common, but works)
warehouse=> \dt analytics.~events_.*
-- 3) Catalog query (portable to SDK)
warehouse=> SELECT n.nspname, c.relname
-> FROM pg_class c
-> JOIN pg_namespace n ON n.oid = c.relnamespace
-> WHERE n.nspname = 'analytics'
-> AND c.relname LIKE 'events_%'
-> AND c.relkind = 'r';
-- 4) information_schema query (ANSI-standard, slower)
warehouse=> SELECT table_schema, table_name
-> FROM information_schema.tables
-> WHERE table_schema = 'analytics'
-> AND table_name LIKE 'events_%'
-> AND table_type = 'BASE TABLE';
Step-by-step explanation.
- The glob form is the shortest — 26 characters. Use this when typing interactively.
- psql also accepts a regex if you prefix with
~, but few engineers know this; stick to glob. - The
pg_catalogquery is what\dtexpands to. Portable to any SDK because it is just SQL. Filterc.relkind = 'r'for tables; use'v'for views,'i'for indexes,'p'for partitioned tables,'m'for materialised views. - The
information_schemaform is ANSI-standard and works on every Postgres-compatible DB (CockroachDB, Yugabyte, etc.). Slightly slower because it does extra view-resolution work.
Output. All four return the same table list. Pick by context: interactive uses glob; SDK uses pg_catalog; multi-database tooling uses information_schema.
Rule of thumb. For ad-hoc use, glob. For automation, pg_catalog. For portable tools, information_schema. Run psql with -E once to see exactly which catalog query each \d command expands to.
Worked example — \x expanded display rescues wide rows
Detailed explanation. A row with 30 columns is unreadable in the default aligned-table output. \x toggles expanded display — each row prints as one column per line, vertically. The same data, infinitely more readable for any row wider than your terminal.
Question. Run a SELECT against pg_stat_activity (16 columns) in the default mode, then in expanded mode, and show why one is unreadable.
Input. A connected psql session.
Code.
-- Default — unreadable for wide rows
warehouse=> SELECT * FROM pg_stat_activity WHERE state = 'active' LIMIT 1;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_port | backend_start
-------+---------+-----+----------+---------+------------------+-------------+-------------+--------------
16384 | wareh.. | 71 | 16385 | analy.. | psql | 10.0.0.1 | 53412 | 2026-06-06 ..
(1 row, but wraps across the terminal)
-- Toggle expanded
warehouse=> \x
Expanded display is on.
warehouse=> SELECT * FROM pg_stat_activity WHERE state = 'active' LIMIT 1;
-[ RECORD 1 ]----+-----------------------
datid | 16384
datname | warehouse
pid | 71
usesysid | 16385
usename | analytics
application_name | psql
client_addr | 10.0.0.1
client_port | 53412
backend_start | 2026-06-06 09:12:00+00
state | active
query | SELECT count(*) FROM events;
Step-by-step explanation.
- The default aligned format works for narrow tables but is unusable past 8 to 10 columns — the row wraps and you cannot tell which value belongs to which column.
-
\xtoggles the expanded display flag. Subsequent queries print one column per line. -
\x autois even better — psql automatically expands when the row would exceed the terminal width. This is the recommended default. - To make
\x autothe global default, add\pset expanded autoto~/.psqlrc.
Output. Same query, fundamentally different readability. The expanded form is the only sane way to inspect pg_stat_activity, pg_class, pg_stat_statements, or any other wide catalog table.
Rule of thumb. Put \pset expanded auto in your ~/.psqlrc. Every wide-table inspection auto-expands; every narrow one stays aligned. Zero per-query thought required.
PostgreSQL interview question on production database inspection
A senior interviewer might frame this as: "You SSH into a database server you've never seen. Give me the eight psql commands you'd run in the first 60 seconds to size up the cluster and find the largest table."
Solution Using a six-command inspection ritual
-- 1) Confirm the connection and current role
warehouse=> \conninfo
You are connected to database "warehouse" as user "analytics" via socket in "/var/run/postgresql" at port "5432".
-- 2) List databases by size
warehouse=> \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Size | Description
-----------+-------------+----------+------------+------------+---------+-------------------
warehouse | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | 482 GB | analytics warehouse
staging | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | 12 GB | staging
postgres | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | 80 MB |
-- 3) List schemas in the largest DB
warehouse=> \dn+
List of schemas
Name | Owner | Access privileges | Description
-----------+---------------+--------------------+---------------
analytics | analytics_owner | analytics_owner=UC | DE-owned mart
public | postgres | postgres=UC | default
-- 4) Top 10 tables by size, all schemas
warehouse=> SELECT schemaname || '.' || relname AS table,
-> pg_size_pretty(pg_total_relation_size(schemaname || '.' || relname)) AS size
-> FROM pg_stat_user_tables
-> ORDER BY pg_total_relation_size(schemaname || '.' || relname) DESC LIMIT 10;
-- 5) Look at the biggest one
warehouse=> \d+ analytics.events
-- 6) Active sessions and longest-running queries
warehouse=> \x auto
warehouse=> SELECT pid, usename, state, query_start, query
-> FROM pg_stat_activity
-> WHERE state = 'active'
-> ORDER BY query_start LIMIT 5;
Step-by-step trace.
| Step | Command | Insight gained |
|---|---|---|
| 1 | \conninfo |
I am on socket, role analytics, port 5432 |
| 2 | \l+ |
three DBs, warehouse is 482 GB |
| 3 | \dn+ |
analytics schema owned by analytics_owner |
| 4 |
pg_stat_user_tables query |
events is 412 GB, orders is 58 GB |
| 5 | \d+ analytics.events |
partitioned by month, three indexes, jsonb payload |
| 6 |
pg_stat_activity query |
one query running 12 minutes — investigate |
Six commands, 60 seconds, complete situational awareness. No GUI could match this in the same time.
Output:
| Question | Answered by |
|---|---|
| Where am I connected? | \conninfo |
| What databases exist? | \l+ |
| What schemas exist? | \dn+ |
| What's the largest table? | catalog query |
| What does it look like? | \d+ |
| What's running right now? | pg_stat_activity |
Why this works — concept by concept:
- \conninfo first — never assume the role or database; the first line of an inspection script confirms what server you're on.
-
\l+ and \dn+ with the plus suffix — adds size and description columns. Always default to the
+form during investigation. - pg_stat_user_tables — the catalog view aggregating per-table size including indexes and TOAST. The "show me the biggest tables" canonical query.
- \d+ for one-stop inspection — columns, types, indexes, partitions, descriptions, sizes — all in one screen.
- pg_stat_activity with \x auto — wide rows display vertically; expanded display turns 12 columns into 12 lines you can read.
- Cost — every catalog query is O(rows in catalog), typically milliseconds even on a database with thousands of tables.
SQL
Topic — SQL
SQL practice problems
3. COPY vs \copy — the bulk-load showdown
COPY runs on the server filesystem; \copy streams over the wire from the client — pick by where the file lives, not by what feels faster
The mental model in one line: COPY ... FROM '/path' reads or writes a file on the PostgreSQL server's filesystem; \copy ... FROM 'path' opens the file on the client machine and streams the bytes through psql to the server's COPY FROM STDIN. Once you internalise that one-sentence split, the entire COPY-vs-\copy decision collapses to "where is my file?"
The decision matrix in one table.
| Factor | Server-side COPY
|
Client-side \copy
|
|---|---|---|
| File location | PostgreSQL server filesystem | machine running psql |
| Permission needed | superuser OR pg_read_server_files
|
any connecting role |
| Speed | fastest — no client→server byte trip | slower — bytes traverse libpq |
| Works on cloud-hosted PG? | NO — RDS / Aurora / Cloud SQL forbid it | YES — the only choice for managed PG |
| Format options | CSV, TEXT, BINARY, plus WITH options | identical |
| FROM PROGRAM | yes (server runs the program) | yes (client runs the program) |
| Use case | bulk ETL from staged files on the DB host | laptop → cloud, CI scripts, remote ingest |
Format options that always apply.
-
FORMAT CSV— comma-separated, default delimiter,, default null marker is empty string. Pair withHEADERto skip the first row. -
FORMAT TEXT— tab-separated, default null marker\N. The PostgreSQL native format; fastest for raw text. -
FORMAT BINARY— PostgreSQL's binary on-disk encoding. Fastest for raw throughput; only useful when both source and target are PostgreSQL. -
WITH (FORMAT CSV, HEADER, DELIMITER ',', NULL '\N', QUOTE '"', ESCAPE '"')— the explicit form. Default everything if you trust the source.
Performance tuning for bulk loads.
-
Drop indexes first, rebuild after. Loading 50 GB into an indexed table is 5× slower than loading then
CREATE INDEXafter. The post-load build is bulk-optimised. -
SET synchronous_commit = OFF— disables fsync per transaction; on RDS this cuts wall-clock by 30–60%. Crash-recovery is still safe; only the last few seconds of commits are at risk. -
Disable triggers.
ALTER TABLE t DISABLE TRIGGER ALL;— triggers fire per row. Disable for the load, re-enable after. -
Use the
BINARYformat when both ends are PostgreSQL. Skips the ASCII-to-binary parse step. -
UNLOGGEDtable. Drop the WAL write for the load:ALTER TABLE t SET UNLOGGED;— thenALTER TABLE t SET LOGGED;after. Saves another 30%. -
COPY ... FREEZE. Marks newly-inserted tuples as frozen, skipping a later vacuum freeze. Requires the table to have been truncated or created in the same transaction.
COPY ... FROM PROGRAM and \copy ... FROM PROGRAM.
-
Server-side
FROM PROGRAM.COPY t FROM PROGRAM 'curl -s https://...'— the server runs the program. Needs superuser; the program runs as the postgres OS user. -
Client-side
\copy ... FROM PROGRAM. Same syntax with a backslash. The program runs on the client machine as the user running psql. Streams the program's stdout intoCOPY FROM STDIN. -
Use case. Bulk-load a CSV directly from S3 without a temp file:
\copy t FROM PROGRAM 'aws s3 cp s3://bucket/file.csv -' WITH (FORMAT CSV, HEADER).
Worked example — load a 10 GB CSV into a remote RDS instance
Detailed explanation. RDS is a managed Postgres service — there is no shell on the database host, so server-side COPY is unavailable. The only path is client-side \copy. Pair with pv for a progress bar and SET synchronous_commit = OFF for speed.
Question. Write the full one-command pipeline to load a 10 GB CSV from a laptop into an RDS database, with a progress bar and pre-load tuning.
Input. A 10 GB CSV named events.csv on the laptop, an RDS endpoint db.cluster-x.us-east-1.rds.amazonaws.com.
Code.
# Pre-load tuning + load + post-load checks, in one psql session
psql "host=db.cluster-x.us-east-1.rds.amazonaws.com \
port=5432 \
user=analytics \
dbname=warehouse \
sslmode=verify-full" <<'EOF'
\timing on
-- 1) Pre-load: drop non-PK indexes, disable triggers, async commit
SET synchronous_commit = OFF;
ALTER TABLE events DISABLE TRIGGER ALL;
DROP INDEX IF EXISTS events_user_id_idx;
DROP INDEX IF EXISTS events_event_time_idx;
-- 2) Bulk-load via \copy
\copy events (event_id, user_id, event_time, payload) \
FROM PROGRAM 'pv -L 100m events.csv' \
WITH (FORMAT CSV, HEADER, NULL '\N')
-- 3) Post-load: rebuild indexes, re-enable triggers, analyze
CREATE INDEX CONCURRENTLY events_user_id_idx ON events(user_id);
CREATE INDEX CONCURRENTLY events_event_time_idx ON events(event_time);
ALTER TABLE events ENABLE TRIGGER ALL;
ANALYZE events;
EOF
Step-by-step explanation.
- The connection string uses key-value form with
sslmode=verify-full— the only safe SSL mode for prod. RDS supports TLS by default. -
SET synchronous_commit = OFFskips the fsync per transaction. On RDS this cuts wall-clock by 30–60%. -
DISABLE TRIGGER ALLplus dropping the non-PK indexes turns a 5-hour load into a 30-minute load. The cost is rebuilding the indexes after. -
\copy ... FROM PROGRAM 'pv -L 100m events.csv'streams the file throughpv(progress bar, rate-limited to 100 MB/s to avoid saturating the laptop's uplink) and into theeventstable. -
CREATE INDEX CONCURRENTLYafter the load means the index build is single-threaded but non-blocking. For an offline window, drop the CONCURRENTLY for a 3× faster build. -
ANALYZE eventsupdates the planner stats so subsequent queries can pick the new indexes.
Output. A 10 GB ingest in 30 minutes with a live progress bar, on a remote managed database that explicitly forbids server-side COPY.
Rule of thumb. Any bulk-load to a cloud-hosted PG uses \copy + pv + the four-step tuning pattern (async commit, disable triggers, drop indexes, ANALYZE after). Memorise the pattern; it is the single most common 30-line shell script in a DE's toolkit.
Worked example — export a query to CSV with \copy ... TO
Detailed explanation. The export side of \copy takes a SELECT and streams the result rows as CSV to a local file. No intermediate tool needed. Pairs with gzip via shell pipe for compressed exports.
Question. Export every order from 2026 to a gzipped CSV on the laptop, using \copy plus FROM PROGRAM for the gzip step.
Input. A connected psql session to a remote DB; the orders table.
Code.
# Direct export
psql -h db.prod -U analytics -d warehouse -c "
\\copy (
SELECT order_id, customer_id, amount, order_date
FROM orders
WHERE order_date >= '2026-01-01'
) TO 'orders_2026.csv' WITH (FORMAT CSV, HEADER)
"
# Gzipped via shell pipe
psql -h db.prod -U analytics -d warehouse -c "
\\copy (
SELECT order_id, customer_id, amount, order_date
FROM orders
WHERE order_date >= '2026-01-01'
) TO PROGRAM 'gzip > orders_2026.csv.gz' WITH (FORMAT CSV, HEADER)
"
# Server-side equivalent (if you have shell access to the DB host)
psql -c "
COPY (
SELECT order_id, customer_id, amount, order_date
FROM orders
WHERE order_date >= '2026-01-01'
) TO '/tmp/orders_2026.csv' WITH (FORMAT CSV, HEADER)
"
Step-by-step explanation.
- The first form opens
orders_2026.csvon the laptop and writes CSV rows as they arrive from the server. Memory cost is constant — psql streams row-by-row. - The
FROM PROGRAM 'gzip > orders_2026.csv.gz'form sends the CSV stream to gzip's stdin. The compressed file lands on the laptop directly; no temp CSV ever exists. - The third form uses server-side
COPYand writes to/tmpon the DB host. Only works if you have shell access to that host (read: not RDS). -
(SELECT ...)in COPY/\copylets you export an arbitrary query result, not just a table. Equivalent to writing a CTE then copying — but one line shorter.
Output. A gzipped CSV of every 2026 order, streamed end-to-end without an intermediate temp file. Compression saves 70–90% of the disk.
Rule of thumb. Default to \copy (SELECT ...) TO PROGRAM 'gzip > file.csv.gz' for any export over 1 GB. The 30% speed cost of gzip is worth the 80% disk savings.
Worked example — COPY (SELECT ...) TO STDOUT piping data between databases
Detailed explanation. The cleanest way to move a table from one Postgres to another is COPY ... TO STDOUT | psql -h target -c "COPY ... FROM STDIN". The bytes never touch disk; the two backends are connected end-to-end by a shell pipe.
Question. Copy the orders table from db.prod to db.staging over a single shell pipe.
Input. Two reachable Postgres endpoints, network connectivity from your machine to both.
Code.
# Stream prod → staging through your terminal
psql -h db.prod -U analytics -d warehouse \
-c "COPY orders TO STDOUT WITH (FORMAT BINARY)" \
| psql -h db.staging -U analytics -d warehouse \
-c "COPY orders FROM STDIN WITH (FORMAT BINARY)"
Step-by-step explanation.
- The left
psqlopens a connection to prod, issuesCOPY orders TO STDOUT. The server streams the table's contents in PostgreSQL binary format to psql's stdout. - The shell pipe routes those bytes to the right
psql's stdin. - The right
psqlopens a connection to staging, issuesCOPY orders FROM STDIN. psql streams stdin into the server, which interprets the binary format and inserts rows. - Both backends are streaming — memory is constant. The only bottleneck is the laptop's uplink (and the slower of the two DB endpoints).
-
FORMAT BINARYsaves 30–50% on bytes-over-wire vs CSV, and skips the ASCII parse step on both ends.
Output. A full table copy with no intermediate file, no schema dump, no migration tool. Pure pipe.
Rule of thumb. When both endpoints are Postgres and you want a table-level copy, this two-psql pipe is shorter than every alternative. Add pv -ab between them for a live throughput readout.
PostgreSQL interview question on bulk-load architecture
A senior interviewer often asks: "Your team needs to ingest a 200 GB CSV into a managed Postgres database every night. Walk me through the bulk-load architecture, the COPY vs \copy decision, and the tuning flags you'd set."
Solution Using client-side \copy with a four-step tuning pattern
#!/usr/bin/env bash
# nightly_ingest.sh — 200 GB CSV → RDS Postgres
set -euo pipefail
PGHOST=db.cluster-x.us-east-1.rds.amazonaws.com
PGUSER=analytics
PGDATABASE=warehouse
PGSSLMODE=verify-full
export PGHOST PGUSER PGDATABASE PGSSLMODE
# 1) Pre-load (transactional, fast)
psql -v ON_ERROR_STOP=1 <<'EOF'
BEGIN;
SET synchronous_commit = OFF;
ALTER TABLE events SET UNLOGGED;
ALTER TABLE events DISABLE TRIGGER ALL;
COMMIT;
EOF
# 2) Bulk-load — \copy streams the file through psql
psql -v ON_ERROR_STOP=1 -c "
\\copy events FROM PROGRAM 'pv events_${1}.csv' \
WITH (FORMAT CSV, HEADER, NULL '\\\\N')
"
# 3) Rebuild indexes (parallel where possible)
psql -v ON_ERROR_STOP=1 <<'EOF'
SET max_parallel_maintenance_workers = 4;
REINDEX TABLE CONCURRENTLY events;
EOF
# 4) Post-load — restore durability, re-enable triggers, analyze
psql -v ON_ERROR_STOP=1 <<'EOF'
ALTER TABLE events SET LOGGED;
ALTER TABLE events ENABLE TRIGGER ALL;
ANALYZE events;
EOF
Step-by-step trace.
| Phase | Wall-clock | Why |
|---|---|---|
| Pre-load (UNLOGGED, async commit, no triggers) | 0.1 s | one transaction, three ALTERs |
\copy with pv |
38 min | streams 200 GB at ~90 MB/s through libpq |
| REINDEX CONCURRENTLY | 12 min | parallel, non-blocking |
| Post-load (LOGGED, triggers, ANALYZE) | 2 min | small ALTERs + planner stats refresh |
| Total | ~52 min | down from 4+ hours naive |
The script is idempotent — re-running it picks up where it left off because ALTER TABLE ... SET UNLOGGED is a no-op on an already-unlogged table.
Output:
| Step | What changed |
|---|---|
| 1 | events table marked UNLOGGED + async commit + no triggers |
| 2 | 200 GB ingested via \copy |
| 3 | all indexes rebuilt |
| 4 | durability restored, triggers re-enabled, planner stats fresh |
Why this works — concept by concept:
-
\copy is the only choice on RDS — managed Postgres forbids server-side
COPY ... FROM '/path'because the client cannot reach the DB host's filesystem.\copyruns on your machine, streams to the server. - UNLOGGED table for the load — skips the WAL write per row. Cuts wall-clock by ~30% on most workloads. Restore to LOGGED after the load.
- synchronous_commit = OFF — relaxes durability for the load. Crash-recovery still works; only the last few seconds of commits are at risk.
- DISABLE TRIGGER ALL — triggers fire per row. For a 200 GB load that is 1.4 B trigger fires. Disable for the load, re-enable after.
- REINDEX CONCURRENTLY — rebuilds indexes from the freshly-loaded data without blocking reads. Parallel-aware on PG 14+.
- ANALYZE last — refreshes planner stats so subsequent queries use the new index sizes and row counts.
- Cost — O(rows) for the load; O(rows · log rows) for each index rebuild. Disk I/O is the bottleneck; the script wrings every cycle out of it.
SQL
Topic — ETL
ETL bulk-load problems
4. Connection strings, .pgpass, and the auth flow
libpq merges five input sources in a strict priority order before opening the TCP connection — knowing the order eliminates 90% of "wrong host" tickets
The mental model in one line: libpq reads connection parameters from CLI args, the connection URL, environment variables, ~/.pg_service.conf, and ~/.pgpass in that priority order — the first source to supply a value wins, and the merged result is what opens the TCP connection. Once you know the order, every "why am I connecting to the wrong host?" question becomes a five-step audit.
The four ways to specify a connection.
| Form | Example | When to use |
|---|---|---|
| Positional CLI args | psql -h db -p 5432 -U me -d warehouse |
one-off, easy to type |
| Connection URL | psql 'postgresql://me:pw@db:5432/warehouse?sslmode=require' |
one-liner with embedded password, great for env vars |
| Key-value string | psql 'host=db port=5432 user=me dbname=warehouse sslmode=verify-full' |
most readable for many options |
| Env vars | PGHOST=db PGUSER=me psql |
the CI / Docker default |
The full set of PG* env vars.
-
PGHOST— server hostname or socket path (default: socket). -
PGPORT— TCP port (default: 5432). -
PGUSER— database role (default: OS user). -
PGPASSWORD— password in plain text; OK for ephemeral CI tokens, never commit. -
PGDATABASE— database name (default: same as PGUSER). -
PGSERVICE— named profile from~/.pg_service.confor/etc/pg_service.conf. -
PGSSLMODE—disable/allow/prefer/require/verify-ca/verify-full. -
PGSSLROOTCERT— path to the CA root cert used for SSL verification. -
PGCONNECT_TIMEOUT— seconds before psql gives up on the TCP connect. -
PGAPPNAME— string visible inpg_stat_activity.application_name; tag your sessions.
The .pgpass file — your password store of last resort.
-
Location.
~/.pgpasson Unix,%APPDATA%\postgresql\pgpass.confon Windows. -
Format. One line per credential:
host:port:database:user:password. Use*as a wildcard. -
Permissions. Must be
chmod 600— libpq silently ignores any.pgpassfile world- or group-readable. No warning, no error. Senior DE gotcha. - Example.
# ~/.pgpass — chmod 600 required
db.prod.example.com:5432:warehouse:analytics:s3cret_prod_token
db.staging.example.com:5432:*:analytics:staging_token
*:5432:postgres:postgres:local_only_admin
~/.pg_service.conf — named connection profiles.
# ~/.pg_service.conf
[prod-warehouse]
host=db.prod.example.com
port=5432
user=analytics
dbname=warehouse
sslmode=verify-full
sslrootcert=/etc/ssl/aws-rds-bundle.pem
[staging-warehouse]
host=db.staging.example.com
port=5432
user=analytics
dbname=warehouse
sslmode=require
Then psql service=prod-warehouse connects with all those parameters. Pair with ~/.pgpass for the password. Migrating to a new password is a one-line edit instead of updating every CI job.
The five SSL modes and what each guarantees.
| sslmode | Encrypted? | Server cert verified? | MITM-safe? |
|---|---|---|---|
disable |
no | no | no |
allow |
maybe | no | no |
prefer |
yes if available | no | no |
require |
yes | no | no — accepts any cert |
verify-ca |
yes | chain verified | partially — DNS hijack still wins |
verify-full |
yes | chain + hostname verified | yes |
Use verify-full in production. Always. Every other mode either skips encryption or skips the verification that prevents man-in-the-middle attacks.
IAM authentication on RDS and Cloud SQL.
-
RDS.
aws rds generate-db-auth-token --hostname $PGHOST --port 5432 --username analyticsreturns a short-lived (15-minute) token. SetPGPASSWORDto it and connect. No static passwords in CI. -
Cloud SQL.
gcloud sql generate-login-tokenreturns an equivalent token. Or use the Cloud SQL Auth Proxy which handles the token transparently. -
Why this matters. Static passwords end up in
git logor in CI logs. Short-lived tokens cannot be exfiltrated for long.
Worked example — the libpq priority order in action
Detailed explanation. A new engineer sets PGHOST=db.staging in their shell, then types psql -h db.prod -U me -d warehouse. They expect to land on staging because the env var is set. They land on prod because CLI args override env vars. This is a daily source of confusion until you internalise the priority order.
Question. Given a session with PGHOST=db.staging and PGUSER=etl exported, what does psql -h db.prod -d warehouse connect to? Walk through the merge.
Input.
export PGHOST=db.staging
export PGUSER=etl
export PGPASSWORD=staging_token
# ~/.pgpass also exists:
# db.prod:5432:warehouse:prod_user:prod_token
Code.
$ psql -h db.prod -d warehouse
# Which host? Which user? Which password?
Step-by-step explanation.
-
CLI -h wins for host.
db.prodoverridesPGHOST=db.staging. Host =db.prod. -
CLI did not supply -U. Falls to env var. User =
etl. -
CLI did not supply -d directly via -U convention.
-d warehousewas supplied. Database =warehouse. -
Password resolution. PGPASSWORD is set → that wins over
.pgpass. Password =staging_token. -
The result. Connect to
db.prodasetlwith passwordstaging_token. This will likely fail authentication because thestaging_tokenis the wrong password for prod.
The fix in this scenario is to unset PGPASSWORD so .pgpass is consulted, OR pass -U prod_user to match the .pgpass entry.
Output. A connection failure with a useful error message: psql: error: connection to server at "db.prod" failed: FATAL: password authentication failed for user "etl".
Rule of thumb. When a connection fails, run psql -h ... --echo-all -L /dev/stderr and look at the resolved parameters. Or just run \conninfo immediately after a successful connection to confirm where you actually landed.
Worked example — .pgpass chmod 600 silent failure
Detailed explanation. .pgpass permissions matter. If the file is chmod 644 or 0666, libpq silently ignores it — no warning, no error. Connections that worked yesterday fail today after a git pull reset the file mode. This is the single most common .pgpass gotcha.
Question. A .pgpass exists with the right entries, but psql still prompts for a password. Show the diagnostic and fix.
Input.
$ ls -la ~/.pgpass
-rw-r--r-- 1 me me 87 Jun 6 09:14 /home/me/.pgpass
$ cat ~/.pgpass
db.prod:5432:warehouse:analytics:correct_password
Code.
# Diagnostic
$ PGPASSWORD='' psql -h db.prod -U analytics -d warehouse
Password for user analytics: # libpq did not read .pgpass
# Check the mode
$ stat -c '%a %n' ~/.pgpass
644 /home/me/.pgpass # too permissive
# Fix
$ chmod 600 ~/.pgpass
$ stat -c '%a %n' ~/.pgpass
600 /home/me/.pgpass
# Retry
$ psql -h db.prod -U analytics -d warehouse
warehouse=> # connected without prompt
Step-by-step explanation.
- The initial connection prompts for a password — first signal that
.pgpassis not being read. -
PGPASSWORD=''explicitly empties the env var so it cannot supply the password; this forces libpq to fall through to.pgpass. -
stat -c '%a %n'shows the file mode is 644 (world-readable). libpq requires 600 (owner read/write only). -
chmod 600 ~/.pgpasstightens permissions. No need to restart anything; libpq reads.pgpasson every connection. - The retry succeeds silently — no password prompt.
Output. The connection now uses .pgpass instead of an interactive prompt. Same for every CI script that depends on the file.
Rule of thumb. Any .pgpass change should be followed by ls -la ~/.pgpass. If the mode is not -rw-------, fix it. The silent-ignore is the #1 source of "but it worked yesterday" tickets.
Worked example — connecting via SSH tunnel + SSL + IAM in one command
Detailed explanation. Many production databases are inside a VPC and only reachable via a bastion host. The connection is SSH → bastion → DB, with SSL on the DB side and IAM-issued short-lived tokens for auth. The whole chain fits in one shell command.
Question. Write the one-command psql invocation that SSH-tunnels to a bastion, opens an SSL connection to an RDS instance, and authenticates with an IAM token instead of a static password.
Input. A bastion bastion.example.com, an RDS endpoint reachable from inside the VPC, AWS CLI configured with credentials.
Code.
# 1) Open SSH tunnel in background — port 25432 on laptop → 5432 on RDS
ssh -fN -L 25432:db.cluster-x.us-east-1.rds.amazonaws.com:5432 \
ec2-user@bastion.example.com
# 2) Generate an IAM auth token (15-minute lifetime)
export PGPASSWORD=$(aws rds generate-db-auth-token \
--hostname db.cluster-x.us-east-1.rds.amazonaws.com \
--port 5432 \
--username analytics)
# 3) Connect via the tunnel, with SSL verification of the RDS cert
psql "host=localhost \
port=25432 \
user=analytics \
dbname=warehouse \
sslmode=verify-full \
sslrootcert=/etc/ssl/aws-rds-bundle.pem"
Step-by-step explanation.
-
ssh -fN -Lopens a background tunnel: laptop'slocalhost:25432now forwards to RDS port 5432 through the bastion. The-fdaemonises,-Nskips opening a shell. -
aws rds generate-db-auth-tokencalls the RDS API and returns a short-lived (15-min) signed token. We export it asPGPASSWORD. - The psql connection string targets
localhost:25432(the tunnel) but tells libpq to verify SSL againstaws-rds-bundle.pem— even though the TCP connection is local, the TLS handshake is end-to-end with RDS, so the cert verification still works. -
sslmode=verify-fullverifies both the cert chain and the hostname. Without this, a malicious bastion could MITM the connection. - The session lasts as long as the token is valid (15 min). Re-running step 2 refreshes it.
Output. A fully authenticated SSL connection from your laptop, through a bastion, to a private RDS instance, with no static passwords anywhere in the chain.
Rule of thumb. Production database access from a developer laptop should look like this: SSH tunnel + SSL verify-full + IAM token. Anything simpler is a security audit finding waiting to happen.
PostgreSQL interview question on connection string design
A senior interviewer might ask: "Your team has 30 microservices connecting to the same Postgres cluster. Design the connection-string strategy so password rotation is one file edit and SSL is always verify-full." This is the operational-design probe.
Solution Using pg_service.conf + .pgpass + verify-full
# ~/.pg_service.conf — shipped to every container as a config-map
[warehouse-prod]
host=db.cluster-x.us-east-1.rds.amazonaws.com
port=5432
user=analytics
dbname=warehouse
sslmode=verify-full
sslrootcert=/etc/ssl/aws-rds-bundle.pem
application_name=service-orders
[warehouse-replica]
host=db-replica.cluster-x.us-east-1.rds.amazonaws.com
port=5432
user=analytics_read
dbname=warehouse
sslmode=verify-full
sslrootcert=/etc/ssl/aws-rds-bundle.pem
application_name=service-orders-read
# ~/.pgpass — generated at deploy time from a secret manager, chmod 600
db.cluster-x.us-east-1.rds.amazonaws.com:5432:warehouse:analytics:CURRENT_TOKEN
db-replica.cluster-x.us-east-1.rds.amazonaws.com:5432:warehouse:analytics_read:CURRENT_READ_TOKEN
# Service connects with:
psql service=warehouse-prod
psql service=warehouse-replica
Step-by-step trace.
| Operation | Edit needed | Services affected |
|---|---|---|
| Rotate prod password | one line in .pgpass
|
every service that uses warehouse-prod
|
| Add SSL pinning | one line in .pg_service.conf
|
every consumer of the changed service |
| Add a new read replica | one new [warehouse-replica-2] block + .pgpass line |
only services that opt in |
Change application_name per service |
edit the per-service .pg_service.conf block |
one service |
The 30 services all consume named profiles. The credentials live in one place; the connection metadata lives in another.
Output:
| Layer | Source of truth | Permission |
|---|---|---|
| Connection metadata | ~/.pg_service.conf |
shipped via config-map |
| Passwords | ~/.pgpass |
chmod 600, generated at deploy from secrets manager |
| SSL trust anchor | /etc/ssl/aws-rds-bundle.pem |
shipped with the container |
| Service code | psql service=warehouse-prod |
no credentials in source |
Why this works — concept by concept:
- pg_service.conf as named profiles — every connection metadata field lives in one file. Services reference profiles by name; no string-concatenation in the code.
- .pgpass as the password store — separate from the metadata file. Generated at deploy from a secrets manager (Vault, AWS Secrets Manager, Kubernetes secrets), chmod 600 enforced.
- sslmode=verify-full + sslrootcert — the only safe SSL configuration for production. Every profile has it; no service can accidentally skip verification.
-
application_name per service —
pg_stat_activityshows which service is running which query. Indispensable for incident response. -
Rotation is one file edit — replacing the token in
.pgpass(or running asecret-updatejob that regenerates the file) rotates every service simultaneously, without code changes. - Cost — zero runtime overhead. libpq parses both files once per process; the merged result is cached for the connection lifetime.
SQL
Topic — database
Database design and connectivity (SQL)
5. Scripting psql — variables, conditionals, and CI-friendly runs
psql commands like \set, \gset, \if, ON_ERROR_STOP, and --single-transaction turn psql into a real scripting host — every migration framework eventually depends on this surface
The mental model in one line: psql variables live on the client and expand via :var (literal), :'var' (quoted literal), or :"var" (quoted identifier); \gset captures query results into variables; \if branches; ON_ERROR_STOP=1 aborts on first error. Master these and you can write idempotent, branching migrations that run identically in dev, staging, and production.
Single command vs script — the -c vs -f choice.
-
psql -c "SQL"— one-shot. Connection opened, single SQL string executed, connection closed. Ideal for one-liner shell automation. -
psql -f file.sql— full script. The file can contain multiple SQL statements, meta-commands, comments, and\iincludes. The connection lasts until the file ends. -
psql -v ON_ERROR_STOP=1 -f file.sql— the single most important CI flag. Without it, a failing statement is logged but the script continues; with it, the script aborts immediately. -
psql --single-transaction -f file.sql— wraps the entire-ffile inBEGIN; ... COMMIT;. Combined withON_ERROR_STOP, you get all-or-nothing migration semantics for free.
Variables and substitution.
-
\set var value— sets a client-side variable. Value is stored as text. -
:var— expands the variable literally (no quoting). Use for SQL identifiers and numbers. -
:'var'— expands the variable as a quoted string literal. Use for string values in WHERE clauses. -
:"var"— expands the variable as a quoted identifier (table or column name). Use for dynamic table names. -
\unset var— removes the variable. -
\echo :var— prints the variable's value to stderr; useful for debugging.
\gset — capture a query result into variables.
SELECT count(*) AS n FROM orders \gset
\echo Loaded :n orders.
The trailing \gset (no semicolon!) captures column names from the SELECT into psql variables. After this, :n expands to the row count. Indispensable for idempotent migration scripts.
Conditionals — \if / \elif / \else / \endif.
SELECT exists(
SELECT 1 FROM information_schema.tables
WHERE table_schema = 'public' AND table_name = 'orders'
) AS has_orders \gset
\if :has_orders
\echo orders table exists - skipping create
\else
\echo creating orders table
CREATE TABLE orders (...);
\endif
Note that \if evaluates psql-side strings (t / f for booleans). Use with \gset to drive branching from real database state.
CI-friendly output flags.
| Flag | Effect | Use case |
|---|---|---|
--csv |
output rows as CSV | downstream awk / jq / pandas |
--tuples-only (-t) |
drop headers and footers | clean numeric output |
--no-align (-A) |
no column padding | `key |
{% raw %}--field-separator (-F) |
custom separator | TSV: -F$'\t'
|
\pset format json |
JSON rows | structured logging |
\pset format unaligned |
bare values | shell-friendly |
The "ON_ERROR_STOP everywhere" pattern.
-
Always start CI scripts with
psql -v ON_ERROR_STOP=1. Without it, a failed CREATE INDEX is logged but the script proceeds to the next statement — silently leaving the database in a half-migrated state. - For interactive sessions, set it in
~/.psqlrc:\set ON_ERROR_STOP on. Saves you from typo-rollback regret.
Worked example — an idempotent migration with \gset and \if
Detailed explanation. A migration script must be safe to re-run. The script checks whether each change has already been applied, and skips it if so. \gset + \if is the canonical psql pattern; no separate migration framework required.
Question. Write a migration that adds an email_verified column to users and an index on it, but skips both operations if they already exist.
Input. A users table that may or may not already have the column / index.
Code.
-- migrate_001_email_verified.sql
\set ON_ERROR_STOP on
\timing on
-- 1) Check whether the column exists
SELECT exists(
SELECT 1
FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'users'
AND column_name = 'email_verified'
) AS col_exists \gset
\if :col_exists
\echo SKIP — column users.email_verified already exists
\else
\echo APPLY — adding column users.email_verified
ALTER TABLE users ADD COLUMN email_verified boolean NOT NULL DEFAULT false;
\endif
-- 2) Check whether the index exists
SELECT exists(
SELECT 1
FROM pg_indexes
WHERE schemaname = 'public'
AND tablename = 'users'
AND indexname = 'users_email_verified_idx'
) AS idx_exists \gset
\if :idx_exists
\echo SKIP — index users_email_verified_idx already exists
\else
\echo APPLY — building index users_email_verified_idx
CREATE INDEX CONCURRENTLY users_email_verified_idx
ON users(email_verified)
WHERE email_verified = true;
\endif
\echo Migration complete.
Step-by-step explanation.
-
\set ON_ERROR_STOP onaborts the script on the first error. Combined with the\ifbranches, this gives clean all-or-nothing semantics. - The first SELECT against
information_schema.columnsreturns one row with a boolean columncol_exists. The trailing\gsetcaptures that into the psql variable:col_exists. -
\if :col_existsbranches on the variable. If the column exists, log "SKIP"; otherwise apply the ALTER. - Same pattern for the index: query
pg_indexes, capture into:idx_exists, branch. - The script can be re-run any number of times. Subsequent runs find both
col_exists = tandidx_exists = t, and skip both operations — idempotent. -
CREATE INDEX CONCURRENTLYcannot run inside a transaction. We do not wrap this script with--single-transaction; the migration is sequence-of-statements rather than atomic.
Output. A migration that is safe to run, re-run, and partially complete. First run: applies both changes. Second run: both skipped. Crashes mid-way: only the unfinished change is left to retry.
Rule of thumb. Use the \gset + \if pattern for every migration that needs idempotency without a separate framework. The pattern fits in 40 lines per migration and is auditable in plain SQL.
Worked example — capturing a count and asserting an invariant
Detailed explanation. A common CI check is "after the migration, the row count must match the source." \gset captures both counts, then a \if asserts equality. Failures abort the run with a clear error.
Question. After bulk-loading events, verify the row count matches an expected value passed in as a variable.
Input. A psql session with \set expected_rows 4218394 already set (or passed in via -v expected_rows=4218394).
Code.
-- assert_event_count.sql
\set ON_ERROR_STOP on
SELECT count(*) AS actual_rows FROM events \gset
\echo Expected: :expected_rows
\echo Actual: :actual_rows
SELECT (:actual_rows = :expected_rows)::text AS match \gset
\if :match
\echo OK — event count matches
\else
\echo MISMATCH — aborting
SELECT 1/0; -- forces an error → ON_ERROR_STOP aborts the script
\endif
Step-by-step explanation.
-
\set ON_ERROR_STOP onmakes any error fatal. - The first SELECT counts rows;
\gsetcaptures the result into:actual_rows. -
\echoprints both variables to stderr for the CI log. - The second SELECT compares them as text and
\gsets the result into:match(eithertorf). - The
\if :matchbranch acknowledges success. - The
\elsebranch raises a deliberate error (SELECT 1/0) soON_ERROR_STOPkicks in and the script exits non-zero. CI now fails the build cleanly.
Output. A 12-line CI assertion that either logs OK — event count matches or aborts with a non-zero exit code and a clear message.
Rule of thumb. Any post-migration assertion can use this pattern. Capture both sides with \gset, compare as text, branch on the boolean, raise a real error to abort. No external test framework needed.
Worked example — quoting modes — :var vs :'var' vs :"var"
Detailed explanation. The three quoting modes are the source of half of psql scripting bugs. :var expands literally — for numbers and SQL fragments. :'var' adds string-literal quotes — for WHERE clause values. :"var" adds identifier quotes — for dynamic table names.
Question. Given \set table users, \set email 'me@x.com', and \set limit 5, write a query that selects from the table named in :table, filters where email matches :email, and limits to :limit rows. Use the right quoting form for each.
Input. A psql session with three variables set.
Code.
\set table users
\set email 'me@x.com'
\set limit 5
-- WRONG — :table expands to bare `users`, OK; :email expands to bare me@x.com, BAD (syntax error)
SELECT * FROM :table WHERE email = :email LIMIT :limit;
-- RIGHT — quote forms picked by data type
SELECT *
FROM :"table" -- identifier quotes: SELECT * FROM "users"
WHERE email = :'email' -- string literal quotes: WHERE email = 'me@x.com'
LIMIT :limit; -- bare expansion: LIMIT 5
-- Expanded (what the server sees):
-- SELECT * FROM "users" WHERE email = 'me@x.com' LIMIT 5;
Step-by-step explanation.
-
:tableexpands tousersliterally. For an identifier, this works only ifusersis a valid bare identifier (no spaces, no reserved word). The:"table"form wraps it in double quotes — safe for any identifier. -
:emailexpands tome@x.comliterally — which is not valid SQL because the string is not quoted.:'email'expands to'me@x.com'— proper string literal. -
:limitexpands to5— a bare integer literal.LIMIT 5is the right SQL. - Mistaking these is a common bug:
WHERE email = :emailproducesWHERE email = me@x.com, which the parser interprets as comparing to themecolumn, fails parse, and (withoutON_ERROR_STOP) might silently continue.
Output. The expanded query reads naturally; the quoting modes do their jobs without per-line thought.
Rule of thumb. Numbers → :var. Strings → :'var'. Table or column names → :"var". Three quoting forms, three jobs — memorise the pairing.
Worked example — CI-friendly output — --csv and --tuples-only
Detailed explanation. When psql output feeds another tool (jq, awk, pandas, a notifier), the default aligned-table format is unusable. The --csv, --tuples-only, and --no-align flags strip headers, footers, and padding to give clean machine-readable output.
Question. Write a one-liner that prints a single number (the order count) suitable for use in a shell variable.
Input. A shell, psql in the path.
Code.
# WRONG — full table output, multi-line, includes headers and a "(1 row)" footer
$ psql -h db.prod -d warehouse -c "SELECT count(*) FROM orders"
count
-------
4218
(1 row)
# RIGHT — tuples-only + no-align gives one bare value
$ psql -h db.prod -d warehouse -t -A -c "SELECT count(*) FROM orders"
4218
# Capture into a shell variable
$ COUNT=$(psql -h db.prod -d warehouse -t -A -c "SELECT count(*) FROM orders")
$ echo "There are $COUNT orders."
There are 4218 orders.
# CSV for multi-column queries
$ psql -h db.prod -d warehouse --csv \
-c "SELECT region, count(*) FROM orders GROUP BY region"
region,count
EU,2100
US,1900
NULL,218
Step-by-step explanation.
- The default output is human-readable but has headers, padding, and a row-count footer — none of which is useful for a shell variable.
-
-t(tuples-only) drops the header and footer. -
-A(no-align) drops column padding. Combined with-t, we get one bare value per row. -
$( ... )in bash captures stdout into a variable. The single-line, untrimmed output is now a clean integer. - For multi-column results,
--csvgives proper CSV with a header row — feed it tocut,awk, orpandas.read_csv.
Output. A pipeline-ready bare value (or CSV) instead of a human-readable table.
Rule of thumb. Any shell script that consumes psql output should pass -tA (single value) or --csv (multi-column). The default aligned format is for humans only.
PostgreSQL interview question on idempotent migrations in CI
A senior interviewer might ask: "Write a CI-runnable Postgres migration script that adds a column, backfills it from another column, builds an index, and is safe to re-run. Walk me through the failure modes." This is the operational-scripting probe.
Solution Using \gset + \if + ON_ERROR_STOP + --single-transaction
#!/usr/bin/env bash
# run_migration.sh — idempotent, transactional where possible
set -euo pipefail
PGHOST=db.prod
PGUSER=migrator
PGDATABASE=warehouse
export PGHOST PGUSER PGDATABASE
# Phase A — atomic schema change inside a single transaction
psql --single-transaction -v ON_ERROR_STOP=1 <<'EOF'
\timing on
-- Check + add column
SELECT exists(
SELECT 1 FROM information_schema.columns
WHERE table_schema='public' AND table_name='users' AND column_name='full_name'
) AS col_exists \gset
\if :col_exists
\echo SKIP — column users.full_name already exists
\else
ALTER TABLE users ADD COLUMN full_name text;
UPDATE users SET full_name = first_name || ' ' || last_name
WHERE full_name IS NULL;
\endif
EOF
# Phase B — non-transactional CONCURRENTLY index build
psql -v ON_ERROR_STOP=1 <<'EOF'
\timing on
SELECT exists(
SELECT 1 FROM pg_indexes
WHERE schemaname='public' AND indexname='users_full_name_idx'
) AS idx_exists \gset
\if :idx_exists
\echo SKIP — index users_full_name_idx already exists
\else
CREATE INDEX CONCURRENTLY users_full_name_idx ON users(full_name);
\endif
EOF
echo "Migration complete."
Step-by-step trace.
| Phase | Wrap | Why |
|---|---|---|
| A — add column + backfill | --single-transaction |
both succeed together or neither happens |
| B — index build | bare (no transaction) |
CREATE INDEX CONCURRENTLY cannot run inside a transaction |
The split is the canonical pattern: atomic schema work goes in phase A; index work goes in phase B. Both phases are idempotent via \gset + \if.
Output:
| Re-run scenario | What happens |
|---|---|
| First run | column added, rows backfilled, index built |
| Second run | both \if branches skip; logs "SKIP" twice |
| Crash mid-backfill | phase A rolls back; re-run completes both |
| Crash mid-index | phase A is done; phase B re-runs and continues the partial index from PG's perspective |
Why this works — concept by concept:
- \gset captures real DB state — every idempotency check queries the live catalog. No external "applied migrations" table required.
- \if branches client-side — psql evaluates the boolean and either runs the SQL block or skips it. The server only sees the SQL that actually runs.
- --single-transaction wraps phase A — the ALTER and the UPDATE are atomic. If the backfill fails halfway, the column add is rolled back too.
- ON_ERROR_STOP=1 in both phases — any unhandled error aborts immediately. CI sees a non-zero exit; the next migration does not run.
- CONCURRENTLY in phase B — index build does not lock writers. Bare (no transaction) because PostgreSQL forbids it inside one. The check + build pair is still idempotent.
- Cost — O(1) catalog queries for the checks; O(rows) for the ALTER/UPDATE and the index build. Same cost as any one-shot migration, with idempotency added for free.
SQL
Topic — SQL
SQL scripting and migration problems
Cheat sheet — psql power-user recipes
-
List tables in a schema.
\dt public.*— glob form.\dt+ public.*adds size and description. -
Describe a table with sizes.
\d+ orders— columns, types, indexes, partitions, total relation size, descriptions. -
Time every query.
\timing oninteractively, or\timing onin~/.psqlrcfor every session. -
Vertical output for wide rows.
\xto toggle expanded display, or\pset expanded autoin~/.psqlrc. -
Load a CSV — client-side.
\copy orders FROM 'orders.csv' WITH (FORMAT CSV, HEADER). -
Load a CSV — server-side.
COPY orders FROM '/var/lib/pg/orders.csv' WITH (FORMAT CSV, HEADER)(superuser only). -
Export to CSV.
\copy (SELECT * FROM orders WHERE order_date >= '2026-01-01') TO 'out.csv' WITH (FORMAT CSV, HEADER). -
Export to gzipped CSV.
\copy (SELECT * FROM orders) TO PROGRAM 'gzip > out.csv.gz' WITH (FORMAT CSV, HEADER). -
Stream a table between databases.
psql -h src -c "COPY t TO STDOUT BINARY" | psql -h dst -c "COPY t FROM STDIN BINARY". -
Switch database in-session.
\c analytics— reconnects with the same role and host. -
Switch role and host.
\c "host=db.staging user=etl dbname=warehouse"— full key-value reconnect. -
Run a file with abort-on-error.
psql -v ON_ERROR_STOP=1 -f migrate.sql. -
Wrap a file in a transaction.
psql --single-transaction -v ON_ERROR_STOP=1 -f migrate.sql. -
Pass a variable from the shell.
psql -v table=orders -c 'SELECT count(*) FROM :"table"'. -
Capture a query row into vars.
SELECT count(*) AS n FROM t \gsetthen\echo :n. -
Branch on DB state.
\if :n > 0...\else...\endif. -
Show the underlying SQL of any \d command.
psql -E— every meta-command prints its expansion. -
List active sessions.
SELECT pid, usename, state, query FROM pg_stat_activity WHERE state='active';plus\x autofor readability. -
Find longest-running query.
SELECT pid, now() - query_start AS dur, query FROM pg_stat_activity WHERE state='active' ORDER BY query_start LIMIT 5;. -
One-bare-value to a shell var.
COUNT=$(psql -tA -c "SELECT count(*) FROM orders"). -
CSV output to stdout.
psql --csv -c "SELECT region, count(*) FROM orders GROUP BY region". -
JSON output.
\pset format jsonthen run any SELECT — each row becomes a JSON object. -
Run a shell command from psql.
\! ls -la— escape to the shell without exiting. -
Set the visible app name.
PGAPPNAME=etl-nightly psql ...shows up inpg_stat_activity.application_name. - Cancel a runaway query. Ctrl-C in psql sends a cancel request; the server backend stops at the next safe point.
Frequently asked questions
COPY vs \copy — which should I use?
Use \copy for any remote or managed database (RDS, Aurora, Cloud SQL, Azure DB for PostgreSQL) — it streams the file over the wire from the client and runs as your connecting role, no superuser required. Use server-side COPY ... FROM '/path' only when psql is running on the same host as the database and you have access to pg_read_server_files (or you're superuser). The format options (FORMAT CSV, HEADER, DELIMITER, etc.) are identical between the two; only the file location and permission model differ. As a rule, if your file is on the same machine as the database, prefer COPY for a small speed win; otherwise \copy is the only option.
Where do I store my psql password safely?
Put it in ~/.pgpass with mode chmod 600. Format is host:port:database:user:password (one per line, * as wildcard). libpq silently ignores any .pgpass not owned by you with mode 600 — no warning, no error — so check ls -la ~/.pgpass if .pgpass is being ignored. For production CI, use short-lived IAM tokens generated by aws rds generate-db-auth-token or gcloud sql generate-login-token and pipe into PGPASSWORD; this avoids static passwords in CI configs. Never commit .pgpass to git, and never set PGPASSWORD in a shell history file.
How do I run a SQL file from the shell and abort on first error?
psql -v ON_ERROR_STOP=1 -f migrate.sql is the canonical CI invocation. The -v ON_ERROR_STOP=1 flag tells psql to exit non-zero on the first SQL or meta-command error — without it, psql logs the error and continues, leaving the database in a half-migrated state. Pair with --single-transaction to wrap the whole file in BEGIN; ... COMMIT; so any failure rolls everything back atomically (except statements like CREATE INDEX CONCURRENTLY that cannot run inside a transaction — those need to live in a separate, non-transactional script).
Why are my \set variables not visible inside DO $$ ... $$ blocks?
Because \set variables live on the client and are expanded before the SQL is sent to the server. By the time the server parses the DO $$ ... $$ block, the :var markers inside have already been substituted with their text values — there is no further substitution inside the dollar-quoted string. So \set name 'Alice' followed by DO $$ BEGIN RAISE NOTICE 'Hello :name'; END $$; does NOT print "Hello Alice" — it prints "Hello :name" literally because the substitution happens outside the dollar-quoted string. The fix is to put the variable expansion outside the DO block: DO $$ BEGIN RAISE NOTICE 'Hello %', ':name'; END $$; becomes DO $$ BEGIN RAISE NOTICE 'Hello %', ' :'name' '; END $$; — clunky. Most teams just use server-side variables via SET LOCAL or pass parameters via EXECUTE instead.
Can I use psql with a connection pooler like PgBouncer?
Yes, but with caveats depending on the pooler's mode. In session mode, PgBouncer dedicates a backend to each client connection for its whole life — psql features work normally, including \set, SET statements, prepared statements, and LISTEN/NOTIFY. In transaction mode (the most common production setup), PgBouncer hands you a different backend after every COMMIT — so \set variables are still fine (client-side), but server-side SET, LISTEN, prepared statements, and temp tables all break because the backend rotates. In statement mode even multi-statement transactions break. As a rule, develop against direct connections and let CI / app servers go through the pooler, and ask your DBA which mode the pooler runs in before relying on any session-state feature.
How do I get psql to output JSON or CSV for downstream pipelines?
For CSV use --csv (psql 12+) which produces proper RFC-4180 CSV with a header row: psql --csv -c "SELECT region, count(*) FROM orders GROUP BY region". For a single bare value (no header, no footer), use -tA (tuples-only, no-align): psql -tA -c "SELECT count(*) FROM orders" returns just the integer. For JSON, use \pset format json interactively or --no-align --tuples-only plus a SELECT that wraps the row in to_jsonb(): psql -tA -c "SELECT to_jsonb(t) FROM (SELECT ...) t". For NDJSON streams (one JSON object per line, ideal for log pipelines), the to_jsonb + -tA combination is the canonical shape. All three forms stream row-by-row, so memory cost stays constant even for millions of rows.
Practice on PipeCode
- Drill the database practice library → for PostgreSQL inspection and admin probes.
- Rehearse on SQL practice problems → when the interviewer wants raw SQL fluency.
- Sharpen ETL practice drills → for the bulk-load and pipeline patterns.
- Layer the aggregation library → for the COUNT(*) vs COUNT(col) family.
- Stack the joins practice library → for the LEFT-vs-INNER outcomes that come up in PG interviews.
- For the broader surface, read top data engineering interview questions →.
- Stack the prerequisites with the only 5 skills you need to become a data engineer →.
- For PostgreSQL types and tuning context, read SQL data types — the PostgreSQL guide →.
- Sharpen the dialect axis with the SQL for data engineering interviews course →.
- For long-form pipeline design, work through the ETL system design course →.
Pipecode.ai is Leetcode for Data Engineering — every psql recipe above pairs with hands-on practice rooms where you write the `\copy`, the `\gset`-driven migration, and the `pg_stat_statements` diagnostic against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so your `psql commands` muscle memory survives the day the interviewer says "show me how you'd debug this in psql."





Top comments (0)