DEV Community

Cover image for psql Command Reference for Data Engineers: Connect, \copy, Bulk-Load, Inspect
Gowtham Potureddi
Gowtham Potureddi

Posted on

psql Command Reference for Data Engineers: Connect, \copy, Bulk-Load, Inspect

psql commands look like cryptic backslash incantations until the day they become the single most productive surface in the entire PostgreSQL stack. Senior data engineers live in psql — not pgAdmin, not DBeaver — because psql is a real REPL with auto-completion, query history, transactions, scripting, and a bulk-load primitive that streams gigabytes over the wire without ever touching the server filesystem.

This reference walks through the four jobs psql does better than any GUI — inspection (\d+), bulk-load (\copy), connection (.pgpass + pg_service.conf + IAM), and scripting (\set, \gset, \if, ON_ERROR_STOP) — with the dialect-by-dialect comparison of COPY vs \copy, the four sslmode levels, and the CI flags every migration runner depends on. Each section pairs a teaching block with a Solution-Tail interview answer — code, a step-by-step trace, an output table, then a concept-by-concept breakdown of why it works.

PipeCode blog header for psql command reference for data engineers — bold white headline 'psql Commands · Cheat Sheet' with subtitle 'connect · \copy · bulk-load · inspect · script' and a stylised meta-command card on a dark gradient with purple, green, orange, and blue accents and a small pipecode.ai attribution.

When you want hands-on reps immediately after reading, drill the database practice library →, rehearse on SQL practice problems →, and stack the bulk-load muscles with ETL practice drills →.


On this page


1. Why psql is the data engineer's secret weapon

psql is a thin client over libpq — every meta-command runs locally, every SQL statement crosses the wire

The one-sentence invariant: psql commands prefixed with a backslash (\dt, \copy, \set, \i) are evaluated entirely on the client side and never reach the PostgreSQL server — everything else is a SQL statement sent over the libpq protocol. Once you internalise that split, the entire surface of psql becomes a clean mental model: the backslash universe is your scripting host, the SQL universe is the database.

Why every senior PG engineer lives in psql, not pgAdmin.

  • Speed. A GUI takes three clicks and 600 ms to expand a table's column list. \d+ orders returns the same information in 30 ms with table size, index list, and row estimate included.
  • Scriptability. GUIs do not version-control well. A psql session with \i migrate.sql can be replayed in CI bit-for-bit identically. The CI flag -v ON_ERROR_STOP=1 makes any failing statement abort the whole run.
  • Honesty. GUIs hide the wire-protocol details that matter — search_path, current role, SSL mode, connection pooler in front. psql shows you all of them via \conninfo.
  • Composability. psql plays well with shell pipes: psql -c "SELECT ..." | jq, pg_dump | psql -h replica, psql -f migrate.sql | tee migrate.log.

The three superpowers in one sentence each.

  • Introspection\d+ table returns columns, types, indexes, sizes, and statistics in one screen. Replaces three GUI panels.
  • Bulk-load\copy table FROM 'file.csv' WITH (FORMAT CSV, HEADER) streams a file from the client machine to the database over the wire. No filesystem access required on the server.
  • Scripting\set, \gset, \if, \elif, \else, \endif turn psql into a real conditional script host that runs the same on a laptop as in CI.

Where psql fits in a modern stack.

  • CI scripts. Every dbt-style or Flyway-style migration framework eventually shells out to psql -v ON_ERROR_STOP=1 -f file.sql. Knowing the CI flags pays for itself the first time a migration fails halfway through.
  • Ad-hoc analysis. Senior DEs reach for psql before notebooks for "answer this question in the next five minutes" work — auto-complete + history + \timing on beats every other surface.
  • Bulk-loading. Loading a 50 GB CSV via the application layer is a recipe for a five-hour ingest. \copy plus pv for a progress bar plus a quick SET synchronous_commit = off runs the same load in 18 minutes.

The 2026 reality.

  • \copy is the only bulk-load primitive you should reach for in a cloud-hosted database — RDS, Aurora, Cloud SQL, Azure Database for PostgreSQL all forbid server-side COPY ... FROM '/path' because the client has no filesystem on the DB host.
  • .pg_service.conf is the modern way to name connection profiles. psql service=prod-warehouse is shorter than retyping a 200-character connection string and survives a password rotation by editing one file.
  • \gset is criminally underused — capturing a query result into a psql variable is how you write idempotent migrations and CI scripts that branch on real data.
  • SET synchronous_commit = OFF before a bulk load can cut wall-clock time by 30 to 60 percent on RDS without sacrificing crash-recovery correctness.

Worked example — psql in the wild, six commands that replace a GUI session

Detailed explanation. A new DE inherits a Postgres database with 30 schemas and 4,000 tables. The task is "find the largest table in the analytics schema, look at its columns, and check its indexes." A GUI requires a tree expansion, a properties dialog, and a separate index view. psql does the whole job in six commands.

Question. Walk through the six-command psql session that lists schemas, switches into one, lists tables ranked by size, and inspects the biggest table's columns and indexes.

Input. The terminal connected to the database with psql -h db.prod -U analytics -d warehouse.

Code.

warehouse=> \dn
       List of schemas
  Name    |     Owner
----------+----------------
 public   | postgres
 analytics| analytics_owner

warehouse=> \dt analytics.*
                List of relations
  Schema   |     Name      | Type  |   Owner
-----------+---------------+-------+----------
 analytics | events        | table | analytics_owner
 analytics | orders        | table | analytics_owner
 analytics | sessions_daily| table | analytics_owner

warehouse=> SELECT relname, pg_size_pretty(pg_total_relation_size('analytics.' || relname)) AS size
   ->      FROM pg_stat_user_tables WHERE schemaname = 'analytics'
   ->      ORDER BY pg_total_relation_size('analytics.' || relname) DESC LIMIT 5;
     relname     |  size
-----------------+--------
 events          | 412 GB
 orders          | 58 GB
 sessions_daily  | 12 GB

warehouse=> \d+ analytics.events
                                Table "analytics.events"
   Column   |     Type      | Nullable | Default | Storage |   Description
------------+---------------+----------+---------+---------+----------------
 event_id   | bigint        | not null |         | plain   | surrogate key
 user_id    | bigint        |          |         | plain   |
 event_time | timestamptz   | not null |         | plain   |
 payload    | jsonb         |          |         | extended|
Indexes:
    "events_pkey" PRIMARY KEY, btree (event_id)
    "events_user_id_idx" btree (user_id)
    "events_event_time_idx" btree (event_time)
Partitions: events_2026_01 ...

warehouse=> \timing on
Timing is on.

warehouse=> SELECT count(*) FROM analytics.events WHERE event_time >= NOW() - INTERVAL '1 day';
   count
-----------
  4218394
Time: 412.318 ms
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. \dn lists schemas — the equivalent of expanding the database node in a GUI tree.
  2. \dt analytics.* lists tables in that schema. The asterisk is psql glob — \dt analytics.events* would filter to a prefix.
  3. A SQL query against pg_stat_user_tables ranks tables by total size (including indexes and TOAST). This is impossible in most GUIs without third-party tooling.
  4. \d+ analytics.events prints columns, types, nullability, defaults, storage type, descriptions, indexes, and any partitions — the GUI-equivalent of three panels at once.
  5. \timing on enables wall-clock measurement of every subsequent query. The DE now knows the COUNT scanned 4.2 M rows in 412 ms — a useful signal for whether the index is being used.
  6. The session is repeatable: pipe these into a .sql file, run with psql -f, and you have a reproducible inspection script.

Output. Same data a GUI would produce, plus query timing — all in plain text that can be copy-pasted into a ticket.

Rule of thumb. Any time you reach for a GUI to inspect a Postgres database, ask "is there a \d for this?" — the answer is almost always yes, and psql will be twice as fast.

Worked example — psql vs sql client SDK, the wire-protocol perspective

Detailed explanation. Junior engineers sometimes argue that psycopg2 or the Node pg library is "the same thing" as psql. It is not. psql is libpq plus a backslash-prefixed REPL that runs the meta-commands locally. SDKs are libpq without the REPL — so they cannot execute \dt or \copy because those are client-side constructs, not SQL.

Question. Show why \dt works in psql but raises a syntax error if you send it through psycopg2's cursor.execute(). Translate \dt analytics.* into the equivalent SQL against pg_catalog.

Input. A connected psql session and a connected psycopg2 connection.

Code.

# WRONG — \dt is psql-only; the server never sees it
import psycopg2
conn = psycopg2.connect("dbname=warehouse user=analytics")
cur = conn.cursor()
cur.execute("\\dt analytics.*")
# raises: syntax error at or near "\"

# RIGHT — execute the SQL that \dt expands to
cur.execute("""
    SELECT n.nspname AS schema, c.relname AS table_name
    FROM pg_class c
    JOIN pg_namespace n ON n.oid = c.relnamespace
    WHERE c.relkind = 'r' AND n.nspname = 'analytics'
    ORDER BY c.relname;
""")
print(cur.fetchall())
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. psql intercepts every line beginning with a backslash. \dt is rewritten on the client into a query against pg_class and pg_namespace, then the result is formatted with psql's table renderer.
  2. psycopg2 sends raw bytes to the server. The PostgreSQL parser sees \dt analytics.* and raises a syntax error — there is no SQL keyword \dt.
  3. The "right" version explicitly queries the catalog tables psql would have queried. This is exactly the SQL psql sends when you type \dt analytics.*.
  4. To see the SQL psql would generate, start psql with -E (echo) — every meta-command prints the underlying SQL before executing. Invaluable for porting psql habits into SDK code.

Output. The Python SDK version returns the same row list, but you wrote the SQL yourself. The psql version is six characters; the SDK version is twelve lines.

Rule of thumb. When automation needs the same data a meta-command shows, run psql with -E once to capture the underlying SQL, then embed that SQL in your SDK code. Never try to ship a backslash command across the wire — it cannot work.

Worked example — repeatable ad-hoc analysis with .psqlrc

Detailed explanation. Every senior PG engineer keeps a ~/.psqlrc file with their preferred defaults — \timing on, prompt format, error verbosity, output null marker. New tabs open with the same setup every time. This is the analyst equivalent of dotfiles.

Question. Write a .psqlrc that enables timing, sets NULL to display as the literal (visible), enables verbose error output, sets a multi-line continuation prompt, and aborts the session on error in script mode.

Input. The file ~/.psqlrc (read on every psql startup).

Code.

-- ~/.psqlrc
\set QUIET 1                       -- silent until the prompt
\timing on                          -- wall-clock for every query
\pset null '␀'                      -- visible NULL marker
\set VERBOSITY verbose              -- full error detail
\set HISTSIZE 100000                -- big query history
\set PROMPT1 '%[%033[1;33m%]%n@%/%R%#%[%033[0m%] '
\set PROMPT2 '   ... > '
\set ON_ERROR_STOP on               -- abort scripts on first error
\set COMP_KEYWORD_CASE upper
\unset QUIET
\echo 'psql session started — \\timing on, NULL=␀, VERBOSITY=verbose'
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. \set QUIET 1 silences the welcome banner while .psqlrc runs. \unset QUIET at the end restores normal verbosity for the user session.
  2. \timing on makes every query report wall-clock time. The very first time you forget to add this and run a slow query, you will wish it had been on.
  3. \pset null '␀' replaces invisible NULLs with a visible Unicode codepoint, so you can see the difference between empty string and NULL at a glance.
  4. \set ON_ERROR_STOP on is global — every script run from this session, including via \i file.sql, will abort on the first error. The CLI flag -v ON_ERROR_STOP=1 is what scripts use; this is the interactive equivalent.
  5. The prompt formats include the user, the current database (%/), and the transaction state marker (%R%#) — so you always know whether you are mid-transaction in a CTE-heavy session.

Output. Every new psql tab starts with timing on, errors verbose, NULLs visible, and a colourful prompt — zero per-session muscle memory required.

Rule of thumb. A six-line ~/.psqlrc saves a senior DE ten minutes a day. Commit it to your dotfiles repo and pair-program with future-you.

PostgreSQL interview question on the psql vs server boundary

A senior interviewer often opens with: "Explain what happens when you type \dt in psql versus typing SELECT 1 — which one crosses the wire, what does the server see, and how would you prove it?" This blends the libpq protocol mental model with practical debugging.

Solution Using psql -E to expose the underlying SQL

# Start psql with -E to echo every meta-command's underlying SQL
psql -E -h db.prod -U analytics -d warehouse

warehouse=> \dt analytics.*
********* QUERY **********
SELECT n.nspname as "Schema",
       c.relname as "Name",
       CASE c.relkind WHEN 'r' THEN 'table' END as "Type",
       pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('r','p','')
  AND n.nspname OPERATOR(pg_catalog.~) '^(analytics)$' COLLATE pg_catalog.default
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
**************************

                List of relations
  Schema   |     Name      | Type  |   Owner
-----------+---------------+-------+----------
 analytics | events        | table | analytics_owner
 analytics | orders        | table | analytics_owner
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Command typed What psql does What the server sees
\dt analytics.* rewrites into a pg_class / pg_namespace query the rewritten SQL above
SELECT 1 sends literally SELECT 1
\timing on flips a client flag nothing — server never notified
\copy t FROM 'f.csv' CSV HEADER opens local file, issues COPY t FROM STDIN CSV HEADER, streams bytes COPY t FROM STDIN CSV HEADER plus the byte stream
\set var 'value' stores in client memory nothing

The -E (echo) flag is the canonical proof — every meta-command prints its expansion before executing, so you can verify exactly what the server sees.

Output:

Surface Crosses wire? Server sees
Backslash meta-commands yes, as SQL or COPY the expanded SQL
Pure backslash settings (\timing, \pset, \set) no nothing
Bare SQL yes the SQL as typed
\copy yes COPY ... FROM STDIN and the byte stream

Why this works — concept by concept:

  • Client / server boundary — psql is a libpq client. Anything beginning with a backslash is a client-side instruction; the server only ever receives SQL strings (or COPY byte streams).
  • psql -E echoes meta-commands — debugging trick to expose the underlying SQL; pair it with \timing on to also see how the catalog query performs.
  • pg_class + pg_namespace — the catalog tables \dt and \d query under the hood. Every introspection meta-command can be reproduced as a query against pg_catalog.
  • \copy is special — it does cross the wire (the bytes go to the server's COPY FROM STDIN), but the file open happens on the client. That is what makes it the right tool for remote databases.
  • Cost — meta-command rewrites are constant-cost on the client; the SQL expansion costs the same as any catalog query. No special server load.

SQL
Topic — database
PostgreSQL database problems (SQL)

Practice →


2. The top 30 meta-commands every PG engineer should memorise

psql commands map to five jobs — meta / help, navigation, inspection, copy / output, scripting — master these and you live in psql instead of a GUI

The mental model in one line: psql's 30 most-used backslash commands cluster into five categories, each replacing what a GUI does in three to five clicks. Once you know the five categories cold, every other meta-command becomes a \? away from being memorised.

Visual cheat sheet of the top 30 psql meta-commands grouped into five categories — meta/help, navigation, inspection, copy/output, and scripting — each category a coloured column with command pills and a one-line description; on a light PipeCode card.

The five-jobs model.

Job Commands What it replaces
Meta / help \?, \h, \q, \conninfo, \encoding, \! the GUI "About this connection" dialog
Navigation \c, \l, \dn, \dt, \dv, \du the tree expansion in the GUI sidebar
Inspection \d, \d+, \sf, \sv, \dx, \df the table / view / function "properties" dialog
Copy / output \x, \timing, \copy, \pset, \H export-to-CSV, query-timing, expanded-display
Scripting \i, \ir, \set, \gset, \if, \echo the SQL editor's "run script" button + variable substitution

The 30 commands in five tiers.

  • Tier 1 — quit, help, connect. \q quit; \? meta-command help; \h SQL syntax help; \conninfo current connection details; \c dbname switch database; \c "host=... user=..." reconnect with new connection string; \! cmd shell escape.
  • Tier 2 — list everything. \l list databases; \dn list schemas; \dt list tables; \dv list views; \df list functions; \du list roles; \dx list extensions; \d list everything in the current schema.
  • Tier 3 — describe. \d table columns + indexes; \d+ table everything + sizes + descriptions; \sf func function source; \sv view view source; \dp table privileges.
  • Tier 4 — output and timing. \x toggle expanded display (vertical); \timing on wall-clock per query; \pset format csv|aligned|html|json switch output format; \pset null '␀' how NULLs display; \H HTML output mode; \o file send output to file.
  • Tier 5 — scripting. \i file.sql include a SQL file; \ir file.sql include relative to current file; \set var value set client variable; \gset capture query result into vars; \if :cond ... \elif ... \else ... \endif branching; \echo :var print to stderr; \copy table FROM 'f.csv' CSV HEADER client-side bulk-load.

Two patterns that matter.

  • The \d family glob pattern. \dt schema.prefix* lists every table in schema whose name starts with prefix. Works for \dv, \df, \d, \dx — anywhere you can pass a name.
  • The + suffix. \d+, \l+, \dn+, \dt+ all add "extra" columns — sizes, descriptions, owner info. Default to the + form when investigating.

Worked example — investigate a slow query end-to-end with only meta-commands

Detailed explanation. A nightly ETL job degraded from 4 minutes to 28 minutes. The root cause investigation needs to identify the slow query, check its plan, and verify whether the right index exists. The entire workflow fits in psql.

Question. Walk through the eight-command psql session that uses pg_stat_statements, EXPLAIN, \d+, and \timing to find a slow query, examine its plan, confirm the missing index, and verify the fix.

Input. The terminal connected to the database with pg_stat_statements enabled.

Code.

warehouse=> \dx pg_stat_statements
                             List of installed extensions
        Name        | Version | Schema |          Description
--------------------+---------+--------+----------------------------------------
 pg_stat_statements | 1.10    | public | track planning and execution statistics

warehouse=> SELECT query, calls, total_exec_time, mean_exec_time
   ->      FROM pg_stat_statements
   ->      ORDER BY total_exec_time DESC LIMIT 3;
                          query                          | calls | total_exec_time | mean_exec_time
---------------------------------------------------------+-------+-----------------+----------------
 SELECT * FROM orders WHERE customer_id = $1            |   240 |       1680000.5 |        7000.00
 SELECT * FROM events WHERE event_time >= $1            |    18 |        144320.0 |        8017.78
 SELECT count(*) FROM sessions_daily                    |   480 |         12000.0 |           25.00

warehouse=> \timing on
Timing is on.

warehouse=> EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on orders  (cost=0.00..285000.00 rows=12 width=88) (actual time=42.3..7012.1 rows=12 loops=1)
   Filter: (customer_id = 123)
   Rows Removed by Filter: 8000000
 Planning Time: 0.182 ms
 Execution Time: 7012.5 ms
Time: 7013.041 ms

warehouse=> \d+ orders
              Table "public.orders"
   Column    |   Type   | Nullable | ...
-------------+----------+----------+----
 order_id    | bigint   | not null |
 customer_id | bigint   |          |
 ...
Indexes:
    "orders_pkey" PRIMARY KEY, btree (order_id)
    "orders_order_date_idx" btree (order_date)

warehouse=> CREATE INDEX CONCURRENTLY orders_customer_id_idx ON orders(customer_id);
CREATE INDEX
Time: 41028.7 ms

warehouse=> EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------
 Index Scan using orders_customer_id_idx on orders  (cost=0.43..38.92 rows=12) (actual time=0.041..0.18 rows=12 loops=1)
   Index Cond: (customer_id = 123)
 Planning Time: 0.122 ms
 Execution Time: 0.231 ms
Time: 0.412 ms
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. \dx pg_stat_statements confirms the extension is installed. If it were not, CREATE EXTENSION pg_stat_statements; plus a config change would be needed.
  2. The query against pg_stat_statements ranks queries by total time. The top offender accounts for 1.68 M ms (28 minutes) — matches the ETL slowdown.
  3. \timing on makes the next EXPLAIN ANALYZE print wall-clock time alongside the planner's estimate.
  4. EXPLAIN ANALYZE confirms the sequential scan filter on 8 M rows. 7 seconds per query × 240 calls = 28 minutes total — matches.
  5. \d+ orders lists indexes — there is no index on customer_id. Confirmed missing.
  6. CREATE INDEX CONCURRENTLY builds the missing index without locking out writes. Takes 41 seconds.
  7. Re-run EXPLAIN ANALYZE — now an Index Scan at 0.231 ms. 7000× faster.
  8. Replay the ETL: should run in 4 minutes again.

Output. Five lines of psql produced a 30,000× per-query speedup. No GUI involved.

Rule of thumb. When a query slows down, the diagnostic path is always pg_stat_statementsEXPLAIN ANALYZE\d+ → fix → re-run EXPLAIN ANALYZE. Memorise the four steps and the four meta-commands; you will use them weekly.

Worked example — \dt glob patterns vs catalog queries

Detailed explanation. \dt accepts a glob pattern that matches schema and table names. The glob is identical to the LIKE patterns used by pg_catalog queries, with * as wildcard. Knowing both lets you pick the right tool — glob for interactive use, catalog query for scripts.

Question. Show the four equivalent ways to list every table in the analytics schema whose name starts with events_: glob, regex, catalog query, and information_schema query.

Input. A connected psql session.

Code.

-- 1) psql glob (interactive)
warehouse=> \dt analytics.events_*

-- 2) psql with regex via ~ operator (less common, but works)
warehouse=> \dt analytics.~events_.*

-- 3) Catalog query (portable to SDK)
warehouse=> SELECT n.nspname, c.relname
   ->      FROM pg_class c
   ->      JOIN pg_namespace n ON n.oid = c.relnamespace
   ->      WHERE n.nspname = 'analytics'
   ->        AND c.relname LIKE 'events_%'
   ->        AND c.relkind = 'r';

-- 4) information_schema query (ANSI-standard, slower)
warehouse=> SELECT table_schema, table_name
   ->      FROM information_schema.tables
   ->      WHERE table_schema = 'analytics'
   ->        AND table_name LIKE 'events_%'
   ->        AND table_type = 'BASE TABLE';
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The glob form is the shortest — 26 characters. Use this when typing interactively.
  2. psql also accepts a regex if you prefix with ~, but few engineers know this; stick to glob.
  3. The pg_catalog query is what \dt expands to. Portable to any SDK because it is just SQL. Filter c.relkind = 'r' for tables; use 'v' for views, 'i' for indexes, 'p' for partitioned tables, 'm' for materialised views.
  4. The information_schema form is ANSI-standard and works on every Postgres-compatible DB (CockroachDB, Yugabyte, etc.). Slightly slower because it does extra view-resolution work.

Output. All four return the same table list. Pick by context: interactive uses glob; SDK uses pg_catalog; multi-database tooling uses information_schema.

Rule of thumb. For ad-hoc use, glob. For automation, pg_catalog. For portable tools, information_schema. Run psql with -E once to see exactly which catalog query each \d command expands to.

Worked example — \x expanded display rescues wide rows

Detailed explanation. A row with 30 columns is unreadable in the default aligned-table output. \x toggles expanded display — each row prints as one column per line, vertically. The same data, infinitely more readable for any row wider than your terminal.

Question. Run a SELECT against pg_stat_activity (16 columns) in the default mode, then in expanded mode, and show why one is unreadable.

Input. A connected psql session.

Code.

-- Default — unreadable for wide rows
warehouse=> SELECT * FROM pg_stat_activity WHERE state = 'active' LIMIT 1;
 datid | datname | pid | usesysid | usename | application_name | client_addr | client_port | backend_start
-------+---------+-----+----------+---------+------------------+-------------+-------------+--------------
 16384 | wareh.. | 71  | 16385    | analy.. | psql             | 10.0.0.1    | 53412       | 2026-06-06 ..
(1 row, but wraps across the terminal)

-- Toggle expanded
warehouse=> \x
Expanded display is on.

warehouse=> SELECT * FROM pg_stat_activity WHERE state = 'active' LIMIT 1;
-[ RECORD 1 ]----+-----------------------
datid            | 16384
datname          | warehouse
pid              | 71
usesysid         | 16385
usename          | analytics
application_name | psql
client_addr      | 10.0.0.1
client_port      | 53412
backend_start    | 2026-06-06 09:12:00+00
state            | active
query            | SELECT count(*) FROM events;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The default aligned format works for narrow tables but is unusable past 8 to 10 columns — the row wraps and you cannot tell which value belongs to which column.
  2. \x toggles the expanded display flag. Subsequent queries print one column per line.
  3. \x auto is even better — psql automatically expands when the row would exceed the terminal width. This is the recommended default.
  4. To make \x auto the global default, add \pset expanded auto to ~/.psqlrc.

Output. Same query, fundamentally different readability. The expanded form is the only sane way to inspect pg_stat_activity, pg_class, pg_stat_statements, or any other wide catalog table.

Rule of thumb. Put \pset expanded auto in your ~/.psqlrc. Every wide-table inspection auto-expands; every narrow one stays aligned. Zero per-query thought required.

PostgreSQL interview question on production database inspection

A senior interviewer might frame this as: "You SSH into a database server you've never seen. Give me the eight psql commands you'd run in the first 60 seconds to size up the cluster and find the largest table."

Solution Using a six-command inspection ritual

-- 1) Confirm the connection and current role
warehouse=> \conninfo
You are connected to database "warehouse" as user "analytics" via socket in "/var/run/postgresql" at port "5432".

-- 2) List databases by size
warehouse=> \l+
                                                       List of databases
   Name    |    Owner    | Encoding |  Collate   |   Ctype    |   Size  |    Description
-----------+-------------+----------+------------+------------+---------+-------------------
 warehouse | postgres    | UTF8     | en_US.UTF8 | en_US.UTF8 |  482 GB | analytics warehouse
 staging   | postgres    | UTF8     | en_US.UTF8 | en_US.UTF8 |   12 GB | staging
 postgres  | postgres    | UTF8     | en_US.UTF8 | en_US.UTF8 |   80 MB |

-- 3) List schemas in the largest DB
warehouse=> \dn+
                          List of schemas
   Name    |     Owner     | Access privileges  |  Description
-----------+---------------+--------------------+---------------
 analytics | analytics_owner | analytics_owner=UC | DE-owned mart
 public    | postgres      | postgres=UC        | default

-- 4) Top 10 tables by size, all schemas
warehouse=> SELECT schemaname || '.' || relname AS table,
   ->             pg_size_pretty(pg_total_relation_size(schemaname || '.' || relname)) AS size
   ->      FROM pg_stat_user_tables
   ->      ORDER BY pg_total_relation_size(schemaname || '.' || relname) DESC LIMIT 10;

-- 5) Look at the biggest one
warehouse=> \d+ analytics.events

-- 6) Active sessions and longest-running queries
warehouse=> \x auto
warehouse=> SELECT pid, usename, state, query_start, query
   ->      FROM pg_stat_activity
   ->      WHERE state = 'active'
   ->      ORDER BY query_start LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Step Command Insight gained
1 \conninfo I am on socket, role analytics, port 5432
2 \l+ three DBs, warehouse is 482 GB
3 \dn+ analytics schema owned by analytics_owner
4 pg_stat_user_tables query events is 412 GB, orders is 58 GB
5 \d+ analytics.events partitioned by month, three indexes, jsonb payload
6 pg_stat_activity query one query running 12 minutes — investigate

Six commands, 60 seconds, complete situational awareness. No GUI could match this in the same time.

Output:

Question Answered by
Where am I connected? \conninfo
What databases exist? \l+
What schemas exist? \dn+
What's the largest table? catalog query
What does it look like? \d+
What's running right now? pg_stat_activity

Why this works — concept by concept:

  • \conninfo first — never assume the role or database; the first line of an inspection script confirms what server you're on.
  • \l+ and \dn+ with the plus suffix — adds size and description columns. Always default to the + form during investigation.
  • pg_stat_user_tables — the catalog view aggregating per-table size including indexes and TOAST. The "show me the biggest tables" canonical query.
  • \d+ for one-stop inspection — columns, types, indexes, partitions, descriptions, sizes — all in one screen.
  • pg_stat_activity with \x auto — wide rows display vertically; expanded display turns 12 columns into 12 lines you can read.
  • Cost — every catalog query is O(rows in catalog), typically milliseconds even on a database with thousands of tables.

SQL
Topic — SQL
SQL practice problems

Practice →


3. COPY vs \copy — the bulk-load showdown

COPY runs on the server filesystem; \copy streams over the wire from the client — pick by where the file lives, not by what feels faster

The mental model in one line: COPY ... FROM '/path' reads or writes a file on the PostgreSQL server's filesystem; \copy ... FROM 'path' opens the file on the client machine and streams the bytes through psql to the server's COPY FROM STDIN. Once you internalise that one-sentence split, the entire COPY-vs-\copy decision collapses to "where is my file?"

Side-by-side comparison of server-side COPY and client-side \copy — left panel shows COPY needing server filesystem access and superuser, right panel shows \copy streaming over the wire with end-user permissions; with a decision card 'remote DB → \copy, same host → COPY' and a performance chip; on a light PipeCode card.

The decision matrix in one table.

Factor Server-side COPY Client-side \copy
File location PostgreSQL server filesystem machine running psql
Permission needed superuser OR pg_read_server_files any connecting role
Speed fastest — no client→server byte trip slower — bytes traverse libpq
Works on cloud-hosted PG? NO — RDS / Aurora / Cloud SQL forbid it YES — the only choice for managed PG
Format options CSV, TEXT, BINARY, plus WITH options identical
FROM PROGRAM yes (server runs the program) yes (client runs the program)
Use case bulk ETL from staged files on the DB host laptop → cloud, CI scripts, remote ingest

Format options that always apply.

  • FORMAT CSV — comma-separated, default delimiter ,, default null marker is empty string. Pair with HEADER to skip the first row.
  • FORMAT TEXT — tab-separated, default null marker \N. The PostgreSQL native format; fastest for raw text.
  • FORMAT BINARY — PostgreSQL's binary on-disk encoding. Fastest for raw throughput; only useful when both source and target are PostgreSQL.
  • WITH (FORMAT CSV, HEADER, DELIMITER ',', NULL '\N', QUOTE '"', ESCAPE '"') — the explicit form. Default everything if you trust the source.

Performance tuning for bulk loads.

  • Drop indexes first, rebuild after. Loading 50 GB into an indexed table is 5× slower than loading then CREATE INDEX after. The post-load build is bulk-optimised.
  • SET synchronous_commit = OFF — disables fsync per transaction; on RDS this cuts wall-clock by 30–60%. Crash-recovery is still safe; only the last few seconds of commits are at risk.
  • Disable triggers. ALTER TABLE t DISABLE TRIGGER ALL; — triggers fire per row. Disable for the load, re-enable after.
  • Use the BINARY format when both ends are PostgreSQL. Skips the ASCII-to-binary parse step.
  • UNLOGGED table. Drop the WAL write for the load: ALTER TABLE t SET UNLOGGED; — then ALTER TABLE t SET LOGGED; after. Saves another 30%.
  • COPY ... FREEZE. Marks newly-inserted tuples as frozen, skipping a later vacuum freeze. Requires the table to have been truncated or created in the same transaction.

COPY ... FROM PROGRAM and \copy ... FROM PROGRAM.

  • Server-side FROM PROGRAM. COPY t FROM PROGRAM 'curl -s https://...' — the server runs the program. Needs superuser; the program runs as the postgres OS user.
  • Client-side \copy ... FROM PROGRAM. Same syntax with a backslash. The program runs on the client machine as the user running psql. Streams the program's stdout into COPY FROM STDIN.
  • Use case. Bulk-load a CSV directly from S3 without a temp file: \copy t FROM PROGRAM 'aws s3 cp s3://bucket/file.csv -' WITH (FORMAT CSV, HEADER).

Worked example — load a 10 GB CSV into a remote RDS instance

Detailed explanation. RDS is a managed Postgres service — there is no shell on the database host, so server-side COPY is unavailable. The only path is client-side \copy. Pair with pv for a progress bar and SET synchronous_commit = OFF for speed.

Question. Write the full one-command pipeline to load a 10 GB CSV from a laptop into an RDS database, with a progress bar and pre-load tuning.

Input. A 10 GB CSV named events.csv on the laptop, an RDS endpoint db.cluster-x.us-east-1.rds.amazonaws.com.

Code.

# Pre-load tuning + load + post-load checks, in one psql session
psql "host=db.cluster-x.us-east-1.rds.amazonaws.com \
      port=5432 \
      user=analytics \
      dbname=warehouse \
      sslmode=verify-full" <<'EOF'
\timing on

-- 1) Pre-load: drop non-PK indexes, disable triggers, async commit
SET synchronous_commit = OFF;
ALTER TABLE events DISABLE TRIGGER ALL;
DROP INDEX IF EXISTS events_user_id_idx;
DROP INDEX IF EXISTS events_event_time_idx;

-- 2) Bulk-load via \copy
\copy events (event_id, user_id, event_time, payload) \
  FROM PROGRAM 'pv -L 100m events.csv' \
  WITH (FORMAT CSV, HEADER, NULL '\N')

-- 3) Post-load: rebuild indexes, re-enable triggers, analyze
CREATE INDEX CONCURRENTLY events_user_id_idx ON events(user_id);
CREATE INDEX CONCURRENTLY events_event_time_idx ON events(event_time);
ALTER TABLE events ENABLE TRIGGER ALL;
ANALYZE events;
EOF
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The connection string uses key-value form with sslmode=verify-full — the only safe SSL mode for prod. RDS supports TLS by default.
  2. SET synchronous_commit = OFF skips the fsync per transaction. On RDS this cuts wall-clock by 30–60%.
  3. DISABLE TRIGGER ALL plus dropping the non-PK indexes turns a 5-hour load into a 30-minute load. The cost is rebuilding the indexes after.
  4. \copy ... FROM PROGRAM 'pv -L 100m events.csv' streams the file through pv (progress bar, rate-limited to 100 MB/s to avoid saturating the laptop's uplink) and into the events table.
  5. CREATE INDEX CONCURRENTLY after the load means the index build is single-threaded but non-blocking. For an offline window, drop the CONCURRENTLY for a 3× faster build.
  6. ANALYZE events updates the planner stats so subsequent queries can pick the new indexes.

Output. A 10 GB ingest in 30 minutes with a live progress bar, on a remote managed database that explicitly forbids server-side COPY.

Rule of thumb. Any bulk-load to a cloud-hosted PG uses \copy + pv + the four-step tuning pattern (async commit, disable triggers, drop indexes, ANALYZE after). Memorise the pattern; it is the single most common 30-line shell script in a DE's toolkit.

Worked example — export a query to CSV with \copy ... TO

Detailed explanation. The export side of \copy takes a SELECT and streams the result rows as CSV to a local file. No intermediate tool needed. Pairs with gzip via shell pipe for compressed exports.

Question. Export every order from 2026 to a gzipped CSV on the laptop, using \copy plus FROM PROGRAM for the gzip step.

Input. A connected psql session to a remote DB; the orders table.

Code.

# Direct export
psql -h db.prod -U analytics -d warehouse -c "
  \\copy (
    SELECT order_id, customer_id, amount, order_date
    FROM orders
    WHERE order_date >= '2026-01-01'
  ) TO 'orders_2026.csv' WITH (FORMAT CSV, HEADER)
"

# Gzipped via shell pipe
psql -h db.prod -U analytics -d warehouse -c "
  \\copy (
    SELECT order_id, customer_id, amount, order_date
    FROM orders
    WHERE order_date >= '2026-01-01'
  ) TO PROGRAM 'gzip > orders_2026.csv.gz' WITH (FORMAT CSV, HEADER)
"

# Server-side equivalent (if you have shell access to the DB host)
psql -c "
  COPY (
    SELECT order_id, customer_id, amount, order_date
    FROM orders
    WHERE order_date >= '2026-01-01'
  ) TO '/tmp/orders_2026.csv' WITH (FORMAT CSV, HEADER)
"
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The first form opens orders_2026.csv on the laptop and writes CSV rows as they arrive from the server. Memory cost is constant — psql streams row-by-row.
  2. The FROM PROGRAM 'gzip > orders_2026.csv.gz' form sends the CSV stream to gzip's stdin. The compressed file lands on the laptop directly; no temp CSV ever exists.
  3. The third form uses server-side COPY and writes to /tmp on the DB host. Only works if you have shell access to that host (read: not RDS).
  4. (SELECT ...) in COPY/\copy lets you export an arbitrary query result, not just a table. Equivalent to writing a CTE then copying — but one line shorter.

Output. A gzipped CSV of every 2026 order, streamed end-to-end without an intermediate temp file. Compression saves 70–90% of the disk.

Rule of thumb. Default to \copy (SELECT ...) TO PROGRAM 'gzip > file.csv.gz' for any export over 1 GB. The 30% speed cost of gzip is worth the 80% disk savings.

Worked example — COPY (SELECT ...) TO STDOUT piping data between databases

Detailed explanation. The cleanest way to move a table from one Postgres to another is COPY ... TO STDOUT | psql -h target -c "COPY ... FROM STDIN". The bytes never touch disk; the two backends are connected end-to-end by a shell pipe.

Question. Copy the orders table from db.prod to db.staging over a single shell pipe.

Input. Two reachable Postgres endpoints, network connectivity from your machine to both.

Code.

# Stream prod → staging through your terminal
psql -h db.prod -U analytics -d warehouse \
     -c "COPY orders TO STDOUT WITH (FORMAT BINARY)" \
| psql -h db.staging -U analytics -d warehouse \
     -c "COPY orders FROM STDIN WITH (FORMAT BINARY)"
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The left psql opens a connection to prod, issues COPY orders TO STDOUT. The server streams the table's contents in PostgreSQL binary format to psql's stdout.
  2. The shell pipe routes those bytes to the right psql's stdin.
  3. The right psql opens a connection to staging, issues COPY orders FROM STDIN. psql streams stdin into the server, which interprets the binary format and inserts rows.
  4. Both backends are streaming — memory is constant. The only bottleneck is the laptop's uplink (and the slower of the two DB endpoints).
  5. FORMAT BINARY saves 30–50% on bytes-over-wire vs CSV, and skips the ASCII parse step on both ends.

Output. A full table copy with no intermediate file, no schema dump, no migration tool. Pure pipe.

Rule of thumb. When both endpoints are Postgres and you want a table-level copy, this two-psql pipe is shorter than every alternative. Add pv -ab between them for a live throughput readout.

PostgreSQL interview question on bulk-load architecture

A senior interviewer often asks: "Your team needs to ingest a 200 GB CSV into a managed Postgres database every night. Walk me through the bulk-load architecture, the COPY vs \copy decision, and the tuning flags you'd set."

Solution Using client-side \copy with a four-step tuning pattern

#!/usr/bin/env bash
# nightly_ingest.sh — 200 GB CSV → RDS Postgres
set -euo pipefail

PGHOST=db.cluster-x.us-east-1.rds.amazonaws.com
PGUSER=analytics
PGDATABASE=warehouse
PGSSLMODE=verify-full
export PGHOST PGUSER PGDATABASE PGSSLMODE

# 1) Pre-load (transactional, fast)
psql -v ON_ERROR_STOP=1 <<'EOF'
BEGIN;
SET synchronous_commit = OFF;
ALTER TABLE events SET UNLOGGED;
ALTER TABLE events DISABLE TRIGGER ALL;
COMMIT;
EOF

# 2) Bulk-load — \copy streams the file through psql
psql -v ON_ERROR_STOP=1 -c "
  \\copy events FROM PROGRAM 'pv events_${1}.csv' \
       WITH (FORMAT CSV, HEADER, NULL '\\\\N')
"

# 3) Rebuild indexes (parallel where possible)
psql -v ON_ERROR_STOP=1 <<'EOF'
SET max_parallel_maintenance_workers = 4;
REINDEX TABLE CONCURRENTLY events;
EOF

# 4) Post-load — restore durability, re-enable triggers, analyze
psql -v ON_ERROR_STOP=1 <<'EOF'
ALTER TABLE events SET LOGGED;
ALTER TABLE events ENABLE TRIGGER ALL;
ANALYZE events;
EOF
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Phase Wall-clock Why
Pre-load (UNLOGGED, async commit, no triggers) 0.1 s one transaction, three ALTERs
\copy with pv 38 min streams 200 GB at ~90 MB/s through libpq
REINDEX CONCURRENTLY 12 min parallel, non-blocking
Post-load (LOGGED, triggers, ANALYZE) 2 min small ALTERs + planner stats refresh
Total ~52 min down from 4+ hours naive

The script is idempotent — re-running it picks up where it left off because ALTER TABLE ... SET UNLOGGED is a no-op on an already-unlogged table.

Output:

Step What changed
1 events table marked UNLOGGED + async commit + no triggers
2 200 GB ingested via \copy
3 all indexes rebuilt
4 durability restored, triggers re-enabled, planner stats fresh

Why this works — concept by concept:

  • \copy is the only choice on RDS — managed Postgres forbids server-side COPY ... FROM '/path' because the client cannot reach the DB host's filesystem. \copy runs on your machine, streams to the server.
  • UNLOGGED table for the load — skips the WAL write per row. Cuts wall-clock by ~30% on most workloads. Restore to LOGGED after the load.
  • synchronous_commit = OFF — relaxes durability for the load. Crash-recovery still works; only the last few seconds of commits are at risk.
  • DISABLE TRIGGER ALL — triggers fire per row. For a 200 GB load that is 1.4 B trigger fires. Disable for the load, re-enable after.
  • REINDEX CONCURRENTLY — rebuilds indexes from the freshly-loaded data without blocking reads. Parallel-aware on PG 14+.
  • ANALYZE last — refreshes planner stats so subsequent queries use the new index sizes and row counts.
  • Cost — O(rows) for the load; O(rows · log rows) for each index rebuild. Disk I/O is the bottleneck; the script wrings every cycle out of it.

SQL
Topic — ETL
ETL bulk-load problems

Practice →


4. Connection strings, .pgpass, and the auth flow

libpq merges five input sources in a strict priority order before opening the TCP connection — knowing the order eliminates 90% of "wrong host" tickets

The mental model in one line: libpq reads connection parameters from CLI args, the connection URL, environment variables, ~/.pg_service.conf, and ~/.pgpass in that priority order — the first source to supply a value wins, and the merged result is what opens the TCP connection. Once you know the order, every "why am I connecting to the wrong host?" question becomes a five-step audit.

Visual diagram of the psql connection and auth flow — left a stack of input sources (CLI args, URL, env vars, .pgpass, pg_service.conf), middle the libpq merging step, right the connection with SSL handshake; with side cards explaining sslmode levels and IAM auth tokens for RDS / Cloud SQL; on a light PipeCode card.

The four ways to specify a connection.

Form Example When to use
Positional CLI args psql -h db -p 5432 -U me -d warehouse one-off, easy to type
Connection URL psql 'postgresql://me:pw@db:5432/warehouse?sslmode=require' one-liner with embedded password, great for env vars
Key-value string psql 'host=db port=5432 user=me dbname=warehouse sslmode=verify-full' most readable for many options
Env vars PGHOST=db PGUSER=me psql the CI / Docker default

The full set of PG* env vars.

  • PGHOST — server hostname or socket path (default: socket).
  • PGPORT — TCP port (default: 5432).
  • PGUSER — database role (default: OS user).
  • PGPASSWORD — password in plain text; OK for ephemeral CI tokens, never commit.
  • PGDATABASE — database name (default: same as PGUSER).
  • PGSERVICE — named profile from ~/.pg_service.conf or /etc/pg_service.conf.
  • PGSSLMODEdisable / allow / prefer / require / verify-ca / verify-full.
  • PGSSLROOTCERT — path to the CA root cert used for SSL verification.
  • PGCONNECT_TIMEOUT — seconds before psql gives up on the TCP connect.
  • PGAPPNAME — string visible in pg_stat_activity.application_name; tag your sessions.

The .pgpass file — your password store of last resort.

  • Location. ~/.pgpass on Unix, %APPDATA%\postgresql\pgpass.conf on Windows.
  • Format. One line per credential: host:port:database:user:password. Use * as a wildcard.
  • Permissions. Must be chmod 600 — libpq silently ignores any .pgpass file world- or group-readable. No warning, no error. Senior DE gotcha.
  • Example.
# ~/.pgpass — chmod 600 required
db.prod.example.com:5432:warehouse:analytics:s3cret_prod_token
db.staging.example.com:5432:*:analytics:staging_token
*:5432:postgres:postgres:local_only_admin
Enter fullscreen mode Exit fullscreen mode

~/.pg_service.conf — named connection profiles.

# ~/.pg_service.conf
[prod-warehouse]
host=db.prod.example.com
port=5432
user=analytics
dbname=warehouse
sslmode=verify-full
sslrootcert=/etc/ssl/aws-rds-bundle.pem

[staging-warehouse]
host=db.staging.example.com
port=5432
user=analytics
dbname=warehouse
sslmode=require
Enter fullscreen mode Exit fullscreen mode

Then psql service=prod-warehouse connects with all those parameters. Pair with ~/.pgpass for the password. Migrating to a new password is a one-line edit instead of updating every CI job.

The five SSL modes and what each guarantees.

sslmode Encrypted? Server cert verified? MITM-safe?
disable no no no
allow maybe no no
prefer yes if available no no
require yes no no — accepts any cert
verify-ca yes chain verified partially — DNS hijack still wins
verify-full yes chain + hostname verified yes

Use verify-full in production. Always. Every other mode either skips encryption or skips the verification that prevents man-in-the-middle attacks.

IAM authentication on RDS and Cloud SQL.

  • RDS. aws rds generate-db-auth-token --hostname $PGHOST --port 5432 --username analytics returns a short-lived (15-minute) token. Set PGPASSWORD to it and connect. No static passwords in CI.
  • Cloud SQL. gcloud sql generate-login-token returns an equivalent token. Or use the Cloud SQL Auth Proxy which handles the token transparently.
  • Why this matters. Static passwords end up in git log or in CI logs. Short-lived tokens cannot be exfiltrated for long.

Worked example — the libpq priority order in action

Detailed explanation. A new engineer sets PGHOST=db.staging in their shell, then types psql -h db.prod -U me -d warehouse. They expect to land on staging because the env var is set. They land on prod because CLI args override env vars. This is a daily source of confusion until you internalise the priority order.

Question. Given a session with PGHOST=db.staging and PGUSER=etl exported, what does psql -h db.prod -d warehouse connect to? Walk through the merge.

Input.

export PGHOST=db.staging
export PGUSER=etl
export PGPASSWORD=staging_token

# ~/.pgpass also exists:
# db.prod:5432:warehouse:prod_user:prod_token
Enter fullscreen mode Exit fullscreen mode

Code.

$ psql -h db.prod -d warehouse
# Which host? Which user? Which password?
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. CLI -h wins for host. db.prod overrides PGHOST=db.staging. Host = db.prod.
  2. CLI did not supply -U. Falls to env var. User = etl.
  3. CLI did not supply -d directly via -U convention. -d warehouse was supplied. Database = warehouse.
  4. Password resolution. PGPASSWORD is set → that wins over .pgpass. Password = staging_token.
  5. The result. Connect to db.prod as etl with password staging_token. This will likely fail authentication because the staging_token is the wrong password for prod.

The fix in this scenario is to unset PGPASSWORD so .pgpass is consulted, OR pass -U prod_user to match the .pgpass entry.

Output. A connection failure with a useful error message: psql: error: connection to server at "db.prod" failed: FATAL: password authentication failed for user "etl".

Rule of thumb. When a connection fails, run psql -h ... --echo-all -L /dev/stderr and look at the resolved parameters. Or just run \conninfo immediately after a successful connection to confirm where you actually landed.

Worked example — .pgpass chmod 600 silent failure

Detailed explanation. .pgpass permissions matter. If the file is chmod 644 or 0666, libpq silently ignores it — no warning, no error. Connections that worked yesterday fail today after a git pull reset the file mode. This is the single most common .pgpass gotcha.

Question. A .pgpass exists with the right entries, but psql still prompts for a password. Show the diagnostic and fix.

Input.

$ ls -la ~/.pgpass
-rw-r--r-- 1 me me 87 Jun  6 09:14 /home/me/.pgpass

$ cat ~/.pgpass
db.prod:5432:warehouse:analytics:correct_password
Enter fullscreen mode Exit fullscreen mode

Code.

# Diagnostic
$ PGPASSWORD='' psql -h db.prod -U analytics -d warehouse
Password for user analytics:        # libpq did not read .pgpass

# Check the mode
$ stat -c '%a %n' ~/.pgpass
644 /home/me/.pgpass                 # too permissive

# Fix
$ chmod 600 ~/.pgpass

$ stat -c '%a %n' ~/.pgpass
600 /home/me/.pgpass

# Retry
$ psql -h db.prod -U analytics -d warehouse
warehouse=>                         # connected without prompt
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The initial connection prompts for a password — first signal that .pgpass is not being read.
  2. PGPASSWORD='' explicitly empties the env var so it cannot supply the password; this forces libpq to fall through to .pgpass.
  3. stat -c '%a %n' shows the file mode is 644 (world-readable). libpq requires 600 (owner read/write only).
  4. chmod 600 ~/.pgpass tightens permissions. No need to restart anything; libpq reads .pgpass on every connection.
  5. The retry succeeds silently — no password prompt.

Output. The connection now uses .pgpass instead of an interactive prompt. Same for every CI script that depends on the file.

Rule of thumb. Any .pgpass change should be followed by ls -la ~/.pgpass. If the mode is not -rw-------, fix it. The silent-ignore is the #1 source of "but it worked yesterday" tickets.

Worked example — connecting via SSH tunnel + SSL + IAM in one command

Detailed explanation. Many production databases are inside a VPC and only reachable via a bastion host. The connection is SSH → bastion → DB, with SSL on the DB side and IAM-issued short-lived tokens for auth. The whole chain fits in one shell command.

Question. Write the one-command psql invocation that SSH-tunnels to a bastion, opens an SSL connection to an RDS instance, and authenticates with an IAM token instead of a static password.

Input. A bastion bastion.example.com, an RDS endpoint reachable from inside the VPC, AWS CLI configured with credentials.

Code.

# 1) Open SSH tunnel in background — port 25432 on laptop → 5432 on RDS
ssh -fN -L 25432:db.cluster-x.us-east-1.rds.amazonaws.com:5432 \
        ec2-user@bastion.example.com

# 2) Generate an IAM auth token (15-minute lifetime)
export PGPASSWORD=$(aws rds generate-db-auth-token \
    --hostname db.cluster-x.us-east-1.rds.amazonaws.com \
    --port 5432 \
    --username analytics)

# 3) Connect via the tunnel, with SSL verification of the RDS cert
psql "host=localhost \
      port=25432 \
      user=analytics \
      dbname=warehouse \
      sslmode=verify-full \
      sslrootcert=/etc/ssl/aws-rds-bundle.pem"
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. ssh -fN -L opens a background tunnel: laptop's localhost:25432 now forwards to RDS port 5432 through the bastion. The -f daemonises, -N skips opening a shell.
  2. aws rds generate-db-auth-token calls the RDS API and returns a short-lived (15-min) signed token. We export it as PGPASSWORD.
  3. The psql connection string targets localhost:25432 (the tunnel) but tells libpq to verify SSL against aws-rds-bundle.pem — even though the TCP connection is local, the TLS handshake is end-to-end with RDS, so the cert verification still works.
  4. sslmode=verify-full verifies both the cert chain and the hostname. Without this, a malicious bastion could MITM the connection.
  5. The session lasts as long as the token is valid (15 min). Re-running step 2 refreshes it.

Output. A fully authenticated SSL connection from your laptop, through a bastion, to a private RDS instance, with no static passwords anywhere in the chain.

Rule of thumb. Production database access from a developer laptop should look like this: SSH tunnel + SSL verify-full + IAM token. Anything simpler is a security audit finding waiting to happen.

PostgreSQL interview question on connection string design

A senior interviewer might ask: "Your team has 30 microservices connecting to the same Postgres cluster. Design the connection-string strategy so password rotation is one file edit and SSL is always verify-full." This is the operational-design probe.

Solution Using pg_service.conf + .pgpass + verify-full

# ~/.pg_service.conf — shipped to every container as a config-map
[warehouse-prod]
host=db.cluster-x.us-east-1.rds.amazonaws.com
port=5432
user=analytics
dbname=warehouse
sslmode=verify-full
sslrootcert=/etc/ssl/aws-rds-bundle.pem
application_name=service-orders

[warehouse-replica]
host=db-replica.cluster-x.us-east-1.rds.amazonaws.com
port=5432
user=analytics_read
dbname=warehouse
sslmode=verify-full
sslrootcert=/etc/ssl/aws-rds-bundle.pem
application_name=service-orders-read

# ~/.pgpass — generated at deploy time from a secret manager, chmod 600
db.cluster-x.us-east-1.rds.amazonaws.com:5432:warehouse:analytics:CURRENT_TOKEN
db-replica.cluster-x.us-east-1.rds.amazonaws.com:5432:warehouse:analytics_read:CURRENT_READ_TOKEN

# Service connects with:
psql service=warehouse-prod
psql service=warehouse-replica
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Operation Edit needed Services affected
Rotate prod password one line in .pgpass every service that uses warehouse-prod
Add SSL pinning one line in .pg_service.conf every consumer of the changed service
Add a new read replica one new [warehouse-replica-2] block + .pgpass line only services that opt in
Change application_name per service edit the per-service .pg_service.conf block one service

The 30 services all consume named profiles. The credentials live in one place; the connection metadata lives in another.

Output:

Layer Source of truth Permission
Connection metadata ~/.pg_service.conf shipped via config-map
Passwords ~/.pgpass chmod 600, generated at deploy from secrets manager
SSL trust anchor /etc/ssl/aws-rds-bundle.pem shipped with the container
Service code psql service=warehouse-prod no credentials in source

Why this works — concept by concept:

  • pg_service.conf as named profiles — every connection metadata field lives in one file. Services reference profiles by name; no string-concatenation in the code.
  • .pgpass as the password store — separate from the metadata file. Generated at deploy from a secrets manager (Vault, AWS Secrets Manager, Kubernetes secrets), chmod 600 enforced.
  • sslmode=verify-full + sslrootcert — the only safe SSL configuration for production. Every profile has it; no service can accidentally skip verification.
  • application_name per servicepg_stat_activity shows which service is running which query. Indispensable for incident response.
  • Rotation is one file edit — replacing the token in .pgpass (or running a secret-update job that regenerates the file) rotates every service simultaneously, without code changes.
  • Cost — zero runtime overhead. libpq parses both files once per process; the merged result is cached for the connection lifetime.

SQL
Topic — database
Database design and connectivity (SQL)

Practice →


5. Scripting psql — variables, conditionals, and CI-friendly runs

psql commands like \set, \gset, \if, ON_ERROR_STOP, and --single-transaction turn psql into a real scripting host — every migration framework eventually depends on this surface

The mental model in one line: psql variables live on the client and expand via :var (literal), :'var' (quoted literal), or :"var" (quoted identifier); \gset captures query results into variables; \if branches; ON_ERROR_STOP=1 aborts on first error. Master these and you can write idempotent, branching migrations that run identically in dev, staging, and production.

Visual diagram of psql scripting features — top a horizontal flow from shell to psql showing -c vs -f and ON_ERROR_STOP=1; middle a panel of \set / \gset / \if branching; bottom a panel of CI-friendly output flags (--csv, --tuples-only, --no-align); on a light PipeCode card.

Single command vs script — the -c vs -f choice.

  • psql -c "SQL" — one-shot. Connection opened, single SQL string executed, connection closed. Ideal for one-liner shell automation.
  • psql -f file.sql — full script. The file can contain multiple SQL statements, meta-commands, comments, and \i includes. The connection lasts until the file ends.
  • psql -v ON_ERROR_STOP=1 -f file.sql — the single most important CI flag. Without it, a failing statement is logged but the script continues; with it, the script aborts immediately.
  • psql --single-transaction -f file.sql — wraps the entire -f file in BEGIN; ... COMMIT;. Combined with ON_ERROR_STOP, you get all-or-nothing migration semantics for free.

Variables and substitution.

  • \set var value — sets a client-side variable. Value is stored as text.
  • :var — expands the variable literally (no quoting). Use for SQL identifiers and numbers.
  • :'var' — expands the variable as a quoted string literal. Use for string values in WHERE clauses.
  • :"var" — expands the variable as a quoted identifier (table or column name). Use for dynamic table names.
  • \unset var — removes the variable.
  • \echo :var — prints the variable's value to stderr; useful for debugging.

\gset — capture a query result into variables.

SELECT count(*) AS n FROM orders \gset
\echo Loaded :n orders.
Enter fullscreen mode Exit fullscreen mode

The trailing \gset (no semicolon!) captures column names from the SELECT into psql variables. After this, :n expands to the row count. Indispensable for idempotent migration scripts.

Conditionals — \if / \elif / \else / \endif.

SELECT exists(
    SELECT 1 FROM information_schema.tables
    WHERE table_schema = 'public' AND table_name = 'orders'
) AS has_orders \gset

\if :has_orders
    \echo orders table exists - skipping create
\else
    \echo creating orders table
    CREATE TABLE orders (...);
\endif
Enter fullscreen mode Exit fullscreen mode

Note that \if evaluates psql-side strings (t / f for booleans). Use with \gset to drive branching from real database state.

CI-friendly output flags.

Flag Effect Use case
--csv output rows as CSV downstream awk / jq / pandas
--tuples-only (-t) drop headers and footers clean numeric output
--no-align (-A) no column padding `key
{% raw %}--field-separator (-F) custom separator TSV: -F$'\t'
\pset format json JSON rows structured logging
\pset format unaligned bare values shell-friendly

The "ON_ERROR_STOP everywhere" pattern.

  • Always start CI scripts with psql -v ON_ERROR_STOP=1. Without it, a failed CREATE INDEX is logged but the script proceeds to the next statement — silently leaving the database in a half-migrated state.
  • For interactive sessions, set it in ~/.psqlrc: \set ON_ERROR_STOP on. Saves you from typo-rollback regret.

Worked example — an idempotent migration with \gset and \if

Detailed explanation. A migration script must be safe to re-run. The script checks whether each change has already been applied, and skips it if so. \gset + \if is the canonical psql pattern; no separate migration framework required.

Question. Write a migration that adds an email_verified column to users and an index on it, but skips both operations if they already exist.

Input. A users table that may or may not already have the column / index.

Code.

-- migrate_001_email_verified.sql
\set ON_ERROR_STOP on
\timing on

-- 1) Check whether the column exists
SELECT exists(
    SELECT 1
    FROM information_schema.columns
    WHERE table_schema = 'public'
      AND table_name   = 'users'
      AND column_name  = 'email_verified'
) AS col_exists \gset

\if :col_exists
    \echo SKIP  column users.email_verified already exists
\else
    \echo APPLY  adding column users.email_verified
    ALTER TABLE users ADD COLUMN email_verified boolean NOT NULL DEFAULT false;
\endif

-- 2) Check whether the index exists
SELECT exists(
    SELECT 1
    FROM pg_indexes
    WHERE schemaname = 'public'
      AND tablename  = 'users'
      AND indexname  = 'users_email_verified_idx'
) AS idx_exists \gset

\if :idx_exists
    \echo SKIP  index users_email_verified_idx already exists
\else
    \echo APPLY  building index users_email_verified_idx
    CREATE INDEX CONCURRENTLY users_email_verified_idx
        ON users(email_verified)
        WHERE email_verified = true;
\endif

\echo Migration complete.
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. \set ON_ERROR_STOP on aborts the script on the first error. Combined with the \if branches, this gives clean all-or-nothing semantics.
  2. The first SELECT against information_schema.columns returns one row with a boolean column col_exists. The trailing \gset captures that into the psql variable :col_exists.
  3. \if :col_exists branches on the variable. If the column exists, log "SKIP"; otherwise apply the ALTER.
  4. Same pattern for the index: query pg_indexes, capture into :idx_exists, branch.
  5. The script can be re-run any number of times. Subsequent runs find both col_exists = t and idx_exists = t, and skip both operations — idempotent.
  6. CREATE INDEX CONCURRENTLY cannot run inside a transaction. We do not wrap this script with --single-transaction; the migration is sequence-of-statements rather than atomic.

Output. A migration that is safe to run, re-run, and partially complete. First run: applies both changes. Second run: both skipped. Crashes mid-way: only the unfinished change is left to retry.

Rule of thumb. Use the \gset + \if pattern for every migration that needs idempotency without a separate framework. The pattern fits in 40 lines per migration and is auditable in plain SQL.

Worked example — capturing a count and asserting an invariant

Detailed explanation. A common CI check is "after the migration, the row count must match the source." \gset captures both counts, then a \if asserts equality. Failures abort the run with a clear error.

Question. After bulk-loading events, verify the row count matches an expected value passed in as a variable.

Input. A psql session with \set expected_rows 4218394 already set (or passed in via -v expected_rows=4218394).

Code.

-- assert_event_count.sql
\set ON_ERROR_STOP on

SELECT count(*) AS actual_rows FROM events \gset

\echo Expected: :expected_rows
\echo Actual:   :actual_rows

SELECT (:actual_rows = :expected_rows)::text AS match \gset

\if :match
    \echo OK  event count matches
\else
    \echo MISMATCH  aborting
    SELECT 1/0;  -- forces an error → ON_ERROR_STOP aborts the script
\endif
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. \set ON_ERROR_STOP on makes any error fatal.
  2. The first SELECT counts rows; \gset captures the result into :actual_rows.
  3. \echo prints both variables to stderr for the CI log.
  4. The second SELECT compares them as text and \gsets the result into :match (either t or f).
  5. The \if :match branch acknowledges success.
  6. The \else branch raises a deliberate error (SELECT 1/0) so ON_ERROR_STOP kicks in and the script exits non-zero. CI now fails the build cleanly.

Output. A 12-line CI assertion that either logs OK — event count matches or aborts with a non-zero exit code and a clear message.

Rule of thumb. Any post-migration assertion can use this pattern. Capture both sides with \gset, compare as text, branch on the boolean, raise a real error to abort. No external test framework needed.

Worked example — quoting modes — :var vs :'var' vs :"var"

Detailed explanation. The three quoting modes are the source of half of psql scripting bugs. :var expands literally — for numbers and SQL fragments. :'var' adds string-literal quotes — for WHERE clause values. :"var" adds identifier quotes — for dynamic table names.

Question. Given \set table users, \set email 'me@x.com', and \set limit 5, write a query that selects from the table named in :table, filters where email matches :email, and limits to :limit rows. Use the right quoting form for each.

Input. A psql session with three variables set.

Code.

\set table users
\set email 'me@x.com'
\set limit 5

-- WRONG — :table expands to bare `users`, OK; :email expands to bare me@x.com, BAD (syntax error)
SELECT * FROM :table WHERE email = :email LIMIT :limit;

-- RIGHT — quote forms picked by data type
SELECT *
FROM :"table"                   -- identifier quotes: SELECT * FROM "users"
WHERE email = :'email'          -- string literal quotes: WHERE email = 'me@x.com'
LIMIT :limit;                   -- bare expansion: LIMIT 5

-- Expanded (what the server sees):
-- SELECT * FROM "users" WHERE email = 'me@x.com' LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. :table expands to users literally. For an identifier, this works only if users is a valid bare identifier (no spaces, no reserved word). The :"table" form wraps it in double quotes — safe for any identifier.
  2. :email expands to me@x.com literally — which is not valid SQL because the string is not quoted. :'email' expands to 'me@x.com' — proper string literal.
  3. :limit expands to 5 — a bare integer literal. LIMIT 5 is the right SQL.
  4. Mistaking these is a common bug: WHERE email = :email produces WHERE email = me@x.com, which the parser interprets as comparing to the me column, fails parse, and (without ON_ERROR_STOP) might silently continue.

Output. The expanded query reads naturally; the quoting modes do their jobs without per-line thought.

Rule of thumb. Numbers → :var. Strings → :'var'. Table or column names → :"var". Three quoting forms, three jobs — memorise the pairing.

Worked example — CI-friendly output — --csv and --tuples-only

Detailed explanation. When psql output feeds another tool (jq, awk, pandas, a notifier), the default aligned-table format is unusable. The --csv, --tuples-only, and --no-align flags strip headers, footers, and padding to give clean machine-readable output.

Question. Write a one-liner that prints a single number (the order count) suitable for use in a shell variable.

Input. A shell, psql in the path.

Code.

# WRONG — full table output, multi-line, includes headers and a "(1 row)" footer
$ psql -h db.prod -d warehouse -c "SELECT count(*) FROM orders"
 count
-------
  4218
(1 row)

# RIGHT — tuples-only + no-align gives one bare value
$ psql -h db.prod -d warehouse -t -A -c "SELECT count(*) FROM orders"
4218

# Capture into a shell variable
$ COUNT=$(psql -h db.prod -d warehouse -t -A -c "SELECT count(*) FROM orders")
$ echo "There are $COUNT orders."
There are 4218 orders.

# CSV for multi-column queries
$ psql -h db.prod -d warehouse --csv \
       -c "SELECT region, count(*) FROM orders GROUP BY region"
region,count
EU,2100
US,1900
NULL,218
Enter fullscreen mode Exit fullscreen mode

Step-by-step explanation.

  1. The default output is human-readable but has headers, padding, and a row-count footer — none of which is useful for a shell variable.
  2. -t (tuples-only) drops the header and footer.
  3. -A (no-align) drops column padding. Combined with -t, we get one bare value per row.
  4. $( ... ) in bash captures stdout into a variable. The single-line, untrimmed output is now a clean integer.
  5. For multi-column results, --csv gives proper CSV with a header row — feed it to cut, awk, or pandas.read_csv.

Output. A pipeline-ready bare value (or CSV) instead of a human-readable table.

Rule of thumb. Any shell script that consumes psql output should pass -tA (single value) or --csv (multi-column). The default aligned format is for humans only.

PostgreSQL interview question on idempotent migrations in CI

A senior interviewer might ask: "Write a CI-runnable Postgres migration script that adds a column, backfills it from another column, builds an index, and is safe to re-run. Walk me through the failure modes." This is the operational-scripting probe.

Solution Using \gset + \if + ON_ERROR_STOP + --single-transaction

#!/usr/bin/env bash
# run_migration.sh — idempotent, transactional where possible
set -euo pipefail

PGHOST=db.prod
PGUSER=migrator
PGDATABASE=warehouse
export PGHOST PGUSER PGDATABASE

# Phase A — atomic schema change inside a single transaction
psql --single-transaction -v ON_ERROR_STOP=1 <<'EOF'
\timing on

-- Check + add column
SELECT exists(
    SELECT 1 FROM information_schema.columns
    WHERE table_schema='public' AND table_name='users' AND column_name='full_name'
) AS col_exists \gset

\if :col_exists
    \echo SKIP — column users.full_name already exists
\else
    ALTER TABLE users ADD COLUMN full_name text;
    UPDATE users SET full_name = first_name || ' ' || last_name
        WHERE full_name IS NULL;
\endif
EOF

# Phase B — non-transactional CONCURRENTLY index build
psql -v ON_ERROR_STOP=1 <<'EOF'
\timing on

SELECT exists(
    SELECT 1 FROM pg_indexes
    WHERE schemaname='public' AND indexname='users_full_name_idx'
) AS idx_exists \gset

\if :idx_exists
    \echo SKIP — index users_full_name_idx already exists
\else
    CREATE INDEX CONCURRENTLY users_full_name_idx ON users(full_name);
\endif
EOF

echo "Migration complete."
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace.

Phase Wrap Why
A — add column + backfill --single-transaction both succeed together or neither happens
B — index build bare (no transaction) CREATE INDEX CONCURRENTLY cannot run inside a transaction

The split is the canonical pattern: atomic schema work goes in phase A; index work goes in phase B. Both phases are idempotent via \gset + \if.

Output:

Re-run scenario What happens
First run column added, rows backfilled, index built
Second run both \if branches skip; logs "SKIP" twice
Crash mid-backfill phase A rolls back; re-run completes both
Crash mid-index phase A is done; phase B re-runs and continues the partial index from PG's perspective

Why this works — concept by concept:

  • \gset captures real DB state — every idempotency check queries the live catalog. No external "applied migrations" table required.
  • \if branches client-side — psql evaluates the boolean and either runs the SQL block or skips it. The server only sees the SQL that actually runs.
  • --single-transaction wraps phase A — the ALTER and the UPDATE are atomic. If the backfill fails halfway, the column add is rolled back too.
  • ON_ERROR_STOP=1 in both phases — any unhandled error aborts immediately. CI sees a non-zero exit; the next migration does not run.
  • CONCURRENTLY in phase B — index build does not lock writers. Bare (no transaction) because PostgreSQL forbids it inside one. The check + build pair is still idempotent.
  • Cost — O(1) catalog queries for the checks; O(rows) for the ALTER/UPDATE and the index build. Same cost as any one-shot migration, with idempotency added for free.

SQL
Topic — SQL
SQL scripting and migration problems

Practice →


Cheat sheet — psql power-user recipes

  • List tables in a schema. \dt public.* — glob form. \dt+ public.* adds size and description.
  • Describe a table with sizes. \d+ orders — columns, types, indexes, partitions, total relation size, descriptions.
  • Time every query. \timing on interactively, or \timing on in ~/.psqlrc for every session.
  • Vertical output for wide rows. \x to toggle expanded display, or \pset expanded auto in ~/.psqlrc.
  • Load a CSV — client-side. \copy orders FROM 'orders.csv' WITH (FORMAT CSV, HEADER).
  • Load a CSV — server-side. COPY orders FROM '/var/lib/pg/orders.csv' WITH (FORMAT CSV, HEADER) (superuser only).
  • Export to CSV. \copy (SELECT * FROM orders WHERE order_date >= '2026-01-01') TO 'out.csv' WITH (FORMAT CSV, HEADER).
  • Export to gzipped CSV. \copy (SELECT * FROM orders) TO PROGRAM 'gzip > out.csv.gz' WITH (FORMAT CSV, HEADER).
  • Stream a table between databases. psql -h src -c "COPY t TO STDOUT BINARY" | psql -h dst -c "COPY t FROM STDIN BINARY".
  • Switch database in-session. \c analytics — reconnects with the same role and host.
  • Switch role and host. \c "host=db.staging user=etl dbname=warehouse" — full key-value reconnect.
  • Run a file with abort-on-error. psql -v ON_ERROR_STOP=1 -f migrate.sql.
  • Wrap a file in a transaction. psql --single-transaction -v ON_ERROR_STOP=1 -f migrate.sql.
  • Pass a variable from the shell. psql -v table=orders -c 'SELECT count(*) FROM :"table"'.
  • Capture a query row into vars. SELECT count(*) AS n FROM t \gset then \echo :n.
  • Branch on DB state. \if :n > 0 ... \else ... \endif.
  • Show the underlying SQL of any \d command. psql -E — every meta-command prints its expansion.
  • List active sessions. SELECT pid, usename, state, query FROM pg_stat_activity WHERE state='active'; plus \x auto for readability.
  • Find longest-running query. SELECT pid, now() - query_start AS dur, query FROM pg_stat_activity WHERE state='active' ORDER BY query_start LIMIT 5;.
  • One-bare-value to a shell var. COUNT=$(psql -tA -c "SELECT count(*) FROM orders").
  • CSV output to stdout. psql --csv -c "SELECT region, count(*) FROM orders GROUP BY region".
  • JSON output. \pset format json then run any SELECT — each row becomes a JSON object.
  • Run a shell command from psql. \! ls -la — escape to the shell without exiting.
  • Set the visible app name. PGAPPNAME=etl-nightly psql ... shows up in pg_stat_activity.application_name.
  • Cancel a runaway query. Ctrl-C in psql sends a cancel request; the server backend stops at the next safe point.

Frequently asked questions

COPY vs \copy — which should I use?

Use \copy for any remote or managed database (RDS, Aurora, Cloud SQL, Azure DB for PostgreSQL) — it streams the file over the wire from the client and runs as your connecting role, no superuser required. Use server-side COPY ... FROM '/path' only when psql is running on the same host as the database and you have access to pg_read_server_files (or you're superuser). The format options (FORMAT CSV, HEADER, DELIMITER, etc.) are identical between the two; only the file location and permission model differ. As a rule, if your file is on the same machine as the database, prefer COPY for a small speed win; otherwise \copy is the only option.

Where do I store my psql password safely?

Put it in ~/.pgpass with mode chmod 600. Format is host:port:database:user:password (one per line, * as wildcard). libpq silently ignores any .pgpass not owned by you with mode 600 — no warning, no error — so check ls -la ~/.pgpass if .pgpass is being ignored. For production CI, use short-lived IAM tokens generated by aws rds generate-db-auth-token or gcloud sql generate-login-token and pipe into PGPASSWORD; this avoids static passwords in CI configs. Never commit .pgpass to git, and never set PGPASSWORD in a shell history file.

How do I run a SQL file from the shell and abort on first error?

psql -v ON_ERROR_STOP=1 -f migrate.sql is the canonical CI invocation. The -v ON_ERROR_STOP=1 flag tells psql to exit non-zero on the first SQL or meta-command error — without it, psql logs the error and continues, leaving the database in a half-migrated state. Pair with --single-transaction to wrap the whole file in BEGIN; ... COMMIT; so any failure rolls everything back atomically (except statements like CREATE INDEX CONCURRENTLY that cannot run inside a transaction — those need to live in a separate, non-transactional script).

Why are my \set variables not visible inside DO $$ ... $$ blocks?

Because \set variables live on the client and are expanded before the SQL is sent to the server. By the time the server parses the DO $$ ... $$ block, the :var markers inside have already been substituted with their text values — there is no further substitution inside the dollar-quoted string. So \set name 'Alice' followed by DO $$ BEGIN RAISE NOTICE 'Hello :name'; END $$; does NOT print "Hello Alice" — it prints "Hello :name" literally because the substitution happens outside the dollar-quoted string. The fix is to put the variable expansion outside the DO block: DO $$ BEGIN RAISE NOTICE 'Hello %', ':name'; END $$; becomes DO $$ BEGIN RAISE NOTICE 'Hello %', ' :'name' '; END $$; — clunky. Most teams just use server-side variables via SET LOCAL or pass parameters via EXECUTE instead.

Can I use psql with a connection pooler like PgBouncer?

Yes, but with caveats depending on the pooler's mode. In session mode, PgBouncer dedicates a backend to each client connection for its whole life — psql features work normally, including \set, SET statements, prepared statements, and LISTEN/NOTIFY. In transaction mode (the most common production setup), PgBouncer hands you a different backend after every COMMIT — so \set variables are still fine (client-side), but server-side SET, LISTEN, prepared statements, and temp tables all break because the backend rotates. In statement mode even multi-statement transactions break. As a rule, develop against direct connections and let CI / app servers go through the pooler, and ask your DBA which mode the pooler runs in before relying on any session-state feature.

How do I get psql to output JSON or CSV for downstream pipelines?

For CSV use --csv (psql 12+) which produces proper RFC-4180 CSV with a header row: psql --csv -c "SELECT region, count(*) FROM orders GROUP BY region". For a single bare value (no header, no footer), use -tA (tuples-only, no-align): psql -tA -c "SELECT count(*) FROM orders" returns just the integer. For JSON, use \pset format json interactively or --no-align --tuples-only plus a SELECT that wraps the row in to_jsonb(): psql -tA -c "SELECT to_jsonb(t) FROM (SELECT ...) t". For NDJSON streams (one JSON object per line, ideal for log pipelines), the to_jsonb + -tA combination is the canonical shape. All three forms stream row-by-row, so memory cost stays constant even for millions of rows.

Practice on PipeCode

Pipecode.ai is Leetcode for Data Engineering — every psql recipe above pairs with hands-on practice rooms where you write the `\copy`, the `\gset`-driven migration, and the `pg_stat_statements` diagnostic against real graded inputs. PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine, so your `psql commands` muscle memory survives the day the interviewer says "show me how you'd debug this in psql."

Practice PostgreSQL now →
ETL bulk-load drills →

Top comments (0)