Teradata Full-Text Search

Production-ready full-text search for Teradata 17.x and 20.x with BM25 relevance scoring.

Teradata Version Support

This system supports both Teradata 17.x and 20.x with the following compatibility:

Feature	Teradata 17.x	Teradata 20.x
TOKENIZE_TEXT_C / TOKENIZE_TEXT_V UDFs	✅ Fully supported	✅ Fully supported
BM25_SCORE UDF	✅ Fully supported	✅ Fully supported
Index Management Procedures	✅ Fully supported	✅ Fully supported
Direct SQL Search Queries	✅ Fully supported	✅ Fully supported
SEARCH_FTS Convenience Procedure	❌ Not available	✅ Available

For Teradata 17 users: All core functionality works perfectly. The SEARCH_FTS procedure (Pattern 4 below) is not available due to TD 17 stored procedure limitations, but you can use direct SQL queries (Patterns 1-3) for the same functionality.

Prerequisites

BTEQ Installation

BTEQ (Basic Teradata Query) is required for deployment. The deployment scripts rely on BTEQ to execute all SQL commands.

Installing BTEQ:

BTEQ is included in Teradata Tools and Utilities (TTU)
Download TTU from Teradata's official website (requires Teradata account)
Install the version appropriate for your operating system:
- Linux/Mac: Teradata Tools and Utilities for Linux/Mac
- Windows: Teradata Tools and Utilities for Windows

Verify Installation:

bteq

This should display BTEQ version information. If the command is not found, ensure TTU is properly installed and BTEQ is in your system PATH.

Network Configuration:

BTEQ must have network access to your Teradata server
Ensure firewall allows connections to Teradata (typically port 1025)
Test connectivity before deployment

Database Access

You need:

Access to a Teradata 17.x or 20.x instance
Database credentials with CREATE DATABASE privilege
- The deployment automatically creates the schema and grants all necessary privileges

Database/Schema Creation:

The deployment script sql/00_create_schema.sql automatically creates the fts_metadata database (1GB permanent space). The user who creates the database automatically becomes the owner with all necessary privileges.

If automatic creation fails (insufficient privileges or DBA-managed databases), ask your DBA to create the database and grant you access:

-- Create the FTS metadata database (1GB permanent space)
CREATE DATABASE fts_metadata AS PERMANENT = 1e9;

-- Grant all privileges to the deployment user (only needed if DBA created the database)
GRANT ALL ON fts_metadata TO your_username;

Note: The database name fts_metadata is hardcoded in all stored procedures and should not be changed. All FTS components (tables, UDFs, procedures) are created in this schema.

System Requirements

Operating System: Linux, macOS, or Windows
Disk Space: Minimal (~50MB for package)
Network: Access to Teradata server

What Gets Installed

All components are created in the fts_metadata schema:

Tables:

fts_indexes - Index metadata and statistics
fts_term_index - Inverted index (term → documents)
fts_document_stats - Document lengths
fts_term_stats - Term frequencies

UDFs (User-Defined Functions):

TOKENIZE_TEXT_C(clob_text) - Tokenizes CLOB text of any size; handles invalid UTF-8 gracefully. Use this in all search queries.
TOKENIZE_TEXT_V(varchar_text) - Tokenizes VARCHAR(32000) CHARACTER SET UNICODE text; ~3× faster, used internally by the _V stored procedures. Not suitable for search queries — see note below.
BM25_SCORE(...) - Calculates relevance scores

Stored Procedures:

Each bulk-indexing operation has two variants — _C (CLOB, any document size, handles bad UTF-8) and _V (VARCHAR ≤ 32000 chars, ~3× faster). Choose one and use it consistently for a given index.

CREATE_FTS_INDEX_C / CREATE_FTS_INDEX_V - Creates and populates an index from a source table
DROP_FTS_INDEX - Removes an index and all associated data
REBUILD_FTS_INDEX_C / REBUILD_FTS_INDEX_V - Rebuilds index after bulk data changes
INDEX_SINGLE_DOCUMENT_C / INDEX_SINGLE_DOCUMENT_V - Adds or re-indexes a single document
DELETE_SINGLE_DOCUMENT - Removes a single document from the index
UPDATE_SINGLE_DOCUMENT_C / UPDATE_SINGLE_DOCUMENT_V - Re-indexes an existing document after a source-table update
SEARCH_FTS - Simplified search interface that populates a volatile table with results (TD 20 only)

Views:

v_fts_indexes - Lists all indexes (quotes stripped from identifiers)
v_fts_indexes_current - Lists indexes in current database (quotes stripped)
v_fts_index_columns - Normalized column list with individual rows per column

Deployment

Step 1: Extract the Package

Linux/Mac:

unzip teradata-fulltext-*.zip
cd teradata-fulltext-*/

Windows:

# Extract using Windows Explorer or 7-Zip, then:
cd teradata-fulltext-<version-number>

Step 2: Configure Credentials

Edit the BTEQ credentials in bteq/install.bteq:

Open bteq/install.bteq in a text editor
Find the .logon line and replace placeholders:
```
.logon <HOSTNAME>/<USERNAME>,<PASSWORD>
```
- <HOSTNAME> - Your Teradata server hostname
- <USERNAME> - Your database username
- <PASSWORD> - Your database password
Find the GRANT ALL line and replace <USERNAME>:
```
GRANT ALL ON fts_metadata TO <USERNAME>;
```
Save the file

Example:

.logon prod-teradata/myuser,mypassword
...
GRANT ALL ON fts_metadata TO myuser;

Note: Use the same username in both places.

Step 3: Run Deployment

Linux/Mac:

./deploy.sh

Windows:

deploy.bat

Manual (BTEQ):

bteq < bteq/install.bteq

Deploying to Teradata 17

If deploying to Teradata 17.x, you must copy the TD 17 header files before running deployment:

# From the project root directory
cp include/17.0/sqltypes_td.h udf_tokenize_text_c/sqltypes_td.h
cp include/17.0/sqltypes_td.h udf_tokenize_text_v/sqltypes_td.h
cp include/17.0/sqltypes_td.h udf_bm25_score/sqltypes_td.h

# Then deploy normally
./deploy.sh

Why is this needed?

Teradata 17 and 20 have different internal type sizes (LOB_LOCATOR: 64 bytes vs 122 bytes)
When UDFs are created, Teradata compiles the C source code on the server
The sqltypes_td.h header file must match the target Teradata version

After deployment to TD 17:

All UDFs and procedures will work (except SEARCH_FTS)
Use Patterns 1-3 for searching (Pattern 4 requires TD 20+)

To deploy to TD 20 later, restore the TD 20 headers first:

cp include/20.0/sqltypes_td.h udf_tokenize_text_c/sqltypes_td.h
cp include/20.0/sqltypes_td.h udf_tokenize_text_v/sqltypes_td.h
cp include/20.0/sqltypes_td.h udf_bm25_score/sqltypes_td.h

Verification

Check that deployment succeeded:

-- List indexes (should be empty initially)
SELECT * FROM fts_metadata.v_fts_indexes;

-- Verify UDFs exist
HELP FUNCTION fts_metadata.TOKENIZE_TEXT_C;
HELP FUNCTION fts_metadata.TOKENIZE_TEXT_V;
HELP FUNCTION fts_metadata.BM25_SCORE;

-- Verify procedures exist
HELP PROCEDURE fts_metadata.CREATE_FTS_INDEX_C;
HELP PROCEDURE fts_metadata.CREATE_FTS_INDEX_V;

Usage

Important: Quoted vs Unquoted Identifiers

Teradata distinguishes between quoted and unquoted identifiers:

Unquoted identifiers (e.g., MyTable) are converted to UPPERCASE (MYTABLE)
Quoted identifiers (e.g., "MyTable") are case-sensitive

When calling FTS procedures, provide identifiers exactly as they should appear in SQL:

-- For standard uppercase tables/columns (most common):
CALL fts_metadata.CREATE_FTS_INDEX_C(
    'my_index',
    'mydb',          -- Becomes MYDB
    'employees',     -- Becomes EMPLOYEES
    'emp_id',        -- Primary key column
    'name',          -- Becomes NAME
    result_msg
);

-- For case-sensitive quoted identifiers:
CALL fts_metadata.CREATE_FTS_INDEX_C(
    'my_index',
    '"MyDB"',        -- Case-sensitive "MyDB"
    '"EmployeeData"',-- Case-sensitive "EmployeeData"
    '"EmpId"',       -- Case-sensitive primary key column
    '"FullName"',    -- Case-sensitive "FullName"
    result_msg
);

-- Mixed (some quoted, some not):
CALL fts_metadata.CREATE_FTS_INDEX_C(
    'my_index',
    'prod_db',       -- Becomes PROD_DB
    '"MixedCase"',   -- Case-sensitive "MixedCase"
    'id',            -- Primary key column
    'description',   -- Becomes DESCRIPTION
    result_msg
);

Rule: Pass identifiers as they would appear in a SELECT statement. If your table was created as CREATE TABLE "MyTable", pass '"MyTable"' to the procedure.

Index Management

Index names are global. Index names must be unique across all schemas — there is no per-schema namespace. If you index two different tables, give each index a distinct name (e.g. sales_products_idx and hr_employees_idx rather than products_idx in both schemas).

-- Create an index — use _C for CLOB/large documents, _V for VARCHAR data (3× faster)
CALL fts_metadata.CREATE_FTS_INDEX_C(
    'my_index',
    DATABASE,            -- Current database
    'my_table',
    'id',               -- Primary key column (used to identify documents in search results)
    'col1,col2,col3',   -- Text columns to index
    result_msg           -- OUT parameter for status message
);
-- VARCHAR variant (text truncated to 32000 chars, ~3× faster):
-- CALL fts_metadata.CREATE_FTS_INDEX_V('my_index', DATABASE, 'my_table', 'id', 'col1,col2,col3', result_msg);

-- List all indexes
SELECT * FROM fts_metadata.v_fts_indexes;

-- Rebuild after data changes (use same variant as CREATE)
CALL fts_metadata.REBUILD_FTS_INDEX_C('my_index', result_msg);
-- CALL fts_metadata.REBUILD_FTS_INDEX_V('my_index', result_msg);

-- Add a newly inserted row to the index
CALL fts_metadata.INDEX_SINGLE_DOCUMENT_C('my_index', CAST(42 AS VARCHAR(512)), result_msg);

-- Refresh the index after a source-table UPDATE
CALL fts_metadata.UPDATE_SINGLE_DOCUMENT_C('my_index', CAST(42 AS VARCHAR(512)), result_msg);

-- Remove a deleted row from the index
CALL fts_metadata.DELETE_SINGLE_DOCUMENT('my_index', CAST(42 AS VARCHAR(512)), result_msg);

-- Drop an index entirely
CALL fts_metadata.DROP_FTS_INDEX('my_index', result_msg);

Searching

Always use TOKENIZE_TEXT_C in search queries, not TOKENIZE_TEXT_V.

TOKENIZE_TEXT_V expects its input to already be VARCHAR(32000) CHARACTER SET UNICODE — the format produced inside the _V stored procedures when text is fetched from fts_temp_docs_v. When called from a search query with an ordinary string literal (which is Latin/ASCII in a standard BTEQ session), the UDF receives the wrong byte layout and produces garbage tokens that match nothing in the index.

TOKENIZE_TEXT_V is used exclusively by CREATE_FTS_INDEX_V, REBUILD_FTS_INDEX_V, and the other _V procedures. It is not exposed for direct use.

All examples below assume a source table and index like this:

CREATE TABLE mydb.products (
    product_id  INTEGER,
    sku         CHAR(20),
    name        VARCHAR(256),
    description CLOB,
    PRIMARY KEY (product_id)
);

CALL fts_metadata.CREATE_FTS_INDEX_C(
    'products_idx',
    'mydb',
    'products',
    'product_id',
    'name,description',
    result_msg
);

Pattern 1: Unranked search — fastest, no scoring

Returns matching rows from the source table. Use this when you need to find documents containing any of the search terms but don't need relevance ordering.

SELECT
    p.product_id,
    p.sku,
    p.name,
    p.description
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless headphones')) AS qt(token, pos)
INNER JOIN fts_metadata.fts_term_index ti
    ON ti.term = qt.token
    AND ti.index_name = 'products_idx'
INNER JOIN mydb.products p
    ON CAST(p.product_id AS VARCHAR(512)) = ti.document_id
GROUP BY
    p.product_id,
    p.sku,
    p.name,
    p.description;

The GROUP BY collapses duplicates that arise when a document matches multiple search terms.

Pattern 2: Ranked search — results ordered by relevance

Returns source table columns plus a BM25 relevance score. Higher scores mean a stronger match. Use SAMPLE N to limit results.

SELECT
    p.product_id,
    p.sku,
    p.name,
    p.description,
    scores.score
FROM (
    SELECT
        ti.document_id,
        SUM(
            fts_metadata.BM25_SCORE(
                ti.term_frequency,
                ds.doc_length,
                idx.avg_doc_length,
                ts.document_frequency,
                idx.document_count
            )
        ) AS score
    FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless headphones')) AS qt(token, pos)
    INNER JOIN fts_metadata.fts_term_index ti
        ON ti.term = qt.token
        AND ti.index_name = 'products_idx'
    INNER JOIN fts_metadata.fts_document_stats ds
        ON ds.index_name = ti.index_name
        AND ds.document_id = ti.document_id
    INNER JOIN fts_metadata.fts_term_stats ts
        ON ts.index_name = ti.index_name
        AND ts.term = ti.term
    CROSS JOIN fts_metadata.fts_indexes idx
    WHERE idx.index_name = 'products_idx'
    GROUP BY ti.document_id
) scores
INNER JOIN mydb.products p
    ON CAST(p.product_id AS VARCHAR(512)) = scores.document_id
ORDER BY scores.score DESC
SAMPLE 10;

Joining on different primary key types

The index stores document_id as VARCHAR(512). The join condition must cast your source table's primary key to the same type:

-- INTEGER primary key
ON CAST(p.product_id AS VARCHAR(512)) = scores.document_id

-- VARCHAR primary key
ON CAST(p.order_ref AS VARCHAR(512)) = scores.document_id

-- CHAR primary key (CAST preserves the fixed-length padding stored during indexing)
ON CAST(p.part_code AS VARCHAR(512)) = scores.document_id

Pattern 3: SEARCH_FTS procedure (TD 20 only) — simplest option

On Teradata 20, the SEARCH_FTS stored procedure handles the join internally and writes results into a volatile table that persists for your session.

CALL fts_metadata.SEARCH_FTS(
    'products_idx',                      -- index name
    'wireless headphones',               -- search terms
    'mydb.products',                     -- source table in database.table format
    'product_id',                        -- primary key column name
    'product_id, sku, name, description',-- columns to return
    10                                   -- maximum number of results
);

SELECT * FROM fts_search_results ORDER BY score DESC;

The volatile table fts_search_results contains all the columns you specified plus a score column. You can query it like any other table for the rest of your session:

-- Filter by minimum relevance
SELECT product_id, name, score
FROM fts_search_results
WHERE score > 2.0
ORDER BY score DESC;

Note: SEARCH_FTS is not available on Teradata 17. Use Pattern 1 or Pattern 2 on TD 17.

Query Syntax

The search query is a plain string of words. There is no special query language — no boolean operators, no field qualifiers, no wildcards. The same tokenizer that processes indexed text is applied to the query string, so the rules are identical.

What Counts as a Token

Character type	Behaviour
Letters (A-Z, a-z)	Included, lowercased
Digits (0-9)	Included as-is
Underscore (`_`)	Included, treated as part of the token
Hyphen (`-`), period (`.`), apostrophe (`'`), etc.	Delimiters — split the token
Whitespace	Delimiter
Non-ASCII (UTF-8)	Included, case is NOT changed

Examples of tokenization:

Input	Tokens produced
`wireless headphones`	`wireless`, `headphones`
`e-mail address`	`e`, `mail`, `address`
`it's a B2B deal`	`it`, `s`, `a`, `b2b`, `deal`
`part_number XR-400`	`part_number`, `xr`, `400`
`"quoted phrase"`	`quoted`, `phrase`
`100%`	`100`

No stop word filtering. Common words like the, a, is are indexed and searchable just like any other word. Queries containing these words will match more documents and score them lower (BM25 down-weights high-frequency terms automatically).

Multi-Word Queries — OR Semantics

Every term in the query is looked up independently. A document is included in results if it contains any of the query terms. Documents that match more terms, or match rarer terms, receive higher BM25 scores.

-- Matches documents containing "wireless" OR "headphones" (or both)
-- Documents matching both terms score higher
SELECT * FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless headphones')) AS t(token, pos);
-- Returns: 'wireless', 'headphones'

There is no way to require that all terms appear in a document — this is a BM25 scoring system, not a boolean filter. Use the score to control result quality rather than trying to enforce term presence.

What Is NOT Supported

Feature	Workaround
`AND` — require all terms	Not supported. Filter by minimum score to approximate.
`NOT` — exclude a term	Not supported. Filter results in the outer query.
`"phrase"` — exact phrase	Not supported. Individual words are matched independently.
`wild*` — prefix/wildcard	Not supported. Exact tokens only.
`field:term` — field-scoped	Not supported. All indexed columns are searched together.
Minimum score threshold	Use `WHERE score > N` in Pattern 1/2, or `HAVING` in the subquery.

Examples

Single-word search

-- All products mentioning "bluetooth"
SELECT ti.document_id
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('bluetooth')) AS qt(token, pos)
INNER JOIN fts_metadata.fts_term_index ti
    ON ti.term = qt.token AND ti.index_name = 'products_idx';

Multi-word search (OR, ranked)

-- Products matching "wireless", "bluetooth", or "headphones" — best matches first
SELECT
    p.product_id,
    p.name,
    scores.score
FROM (
    SELECT
        ti.document_id,
        SUM(
            fts_metadata.BM25_SCORE(
                ti.term_frequency,
                ds.doc_length,
                idx.avg_doc_length,
                ts.document_frequency,
                idx.document_count
            )
        ) AS score
    FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless bluetooth headphones')) AS qt(token, pos)
    INNER JOIN fts_metadata.fts_term_index ti
        ON ti.term = qt.token AND ti.index_name = 'products_idx'
    INNER JOIN fts_metadata.fts_document_stats ds
        ON ds.index_name = ti.index_name AND ds.document_id = ti.document_id
    INNER JOIN fts_metadata.fts_term_stats ts
        ON ts.index_name = ti.index_name AND ts.term = ti.term
    CROSS JOIN fts_metadata.fts_indexes idx
    WHERE idx.index_name = 'products_idx'
    GROUP BY ti.document_id
) scores
INNER JOIN mydb.products p
    ON CAST(p.product_id AS VARCHAR(512)) = scores.document_id
ORDER BY scores.score DESC
SAMPLE 10;

Filtering weak matches by score

-- Only return results with a meaningful relevance score
SELECT p.product_id, p.name, scores.score
FROM ( ... ) scores
INNER JOIN mydb.products p ON CAST(p.product_id AS VARCHAR(512)) = scores.document_id
WHERE scores.score > 1.0
ORDER BY scores.score DESC;

Checking how a query is tokenized

-- Useful for debugging unexpected search results
SELECT token, pos
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('your search query here')) AS t(token, pos)
ORDER BY pos;

Handling Real-World Data Issues

Invalid UTF-8 Characters

The indexing system automatically handles documents containing invalid UTF-8 byte sequences:

What happens during indexing:

Documents are processed sequentially
If invalid UTF-8 is detected during tokenization, that document is skipped
Processing continues with remaining documents
Final result message reports the count of skipped documents

Example:

CALL fts_metadata.CREATE_FTS_INDEX_C('my_index', DATABASE, 'my_table', 'id', 'text_column', result_msg);
-- Result: "Success: 1000 documents (5 skipped due to invalid UTF-8), 4874 terms"

What this means:

995 documents were indexed successfully
5 documents contained invalid UTF-8 and were skipped
The index is usable and contains 4874 unique terms

Important notes:

Skipped documents will not appear in search results
No manual intervention is required
If you need to index all documents, fix the UTF-8 encoding in your source data first
Common causes: binary data in text fields, incorrect character set conversions, legacy encoding issues

Performance: The _C path processes approximately 7 documents per second (~2m 22s for 1000 docs). The _V path is ~3× faster (~45s for 1000 docs) when data is known to be clean UTF-8.

Performance Tips

Use basic search when you don't need scoring - Much faster
Adjust HAVING threshold (e.g., 0.5) to filter weak matches
Use SAMPLE N instead of TOP N for better performance
Index only searchable columns - Don't over-index
Rebuild indexes periodically after bulk data changes

Examples

See sql/examples.sql for complete working examples including:

Test data creation
Index creation and management
All three search patterns
Tokenization examples
Performance tips

Troubleshooting

Problem: "Database 'fts_metadata' does not exist" Solution: The schema is created automatically. If it fails:

CREATE DATABASE fts_metadata AS PERMANENT = 1e9;

Then redeploy.

Problem: "CREATE DATABASE failed - insufficient privileges" Solution: Your user needs CREATE DATABASE privilege. Contact your DBA to grant:

GRANT CREATE DATABASE TO your_username;

Or have the DBA create the database and grant you access:

CREATE DATABASE fts_metadata AS PERMANENT = 1e9;
GRANT ALL ON fts_metadata TO your_username;

Problem: "The user does not have CREATE FUNCTION access to database fts_metadata" Solution: After creating the database, grant yourself all privileges:

GRANT ALL ON fts_metadata TO your_username;

Then redeploy.

Problem: "CREATE TABLE/PROCEDURE/FUNCTION failed - access denied" Solution: Same as above - ensure you have all necessary privileges on the fts_metadata database:

GRANT ALL ON fts_metadata TO your_username;

Problem: "Stored Procedure Created with Warnings" Solution: This is normal for procedures that call other procedures. Safe to ignore.

Problem: Deployment fails partway through Solution: Drop everything and redeploy:

DELETE DATABASE fts_metadata;
DROP DATABASE fts_metadata;

Then run ./deploy.sh again.

Problem: Search returns no results Check:

You are using TOKENIZE_TEXT_C in your query, not TOKENIZE_TEXT_V (see note in the Searching section)
Index exists: SELECT * FROM fts_metadata.v_fts_indexes;
Index has data: SELECT COUNT(*) FROM fts_metadata.fts_term_index WHERE index_name = 'your_index';
Search terms are being tokenized: SELECT * FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('your search'));

Problem: Some documents don't appear in search results Possible causes:

Documents were skipped due to invalid UTF-8 encoding (check the result message from CREATE_FTS_INDEX_C; the _V variant will fail rather than skip)
Documents don't contain any of the search terms after tokenization (punctuation is stripped; hyphens split words, e.g. "e-mail" becomes "e" and "mail")
Documents exist but their text columns are NULL or empty

How It Works

BM25 Scoring

BM25 (Best Matching 25) is a probabilistic ranking algorithm that ranks documents by relevance to a query. It improves on TF-IDF by:

Limiting the impact of term frequency (saturation)
Normalizing for document length
Weighting rare terms higher than common ones

Score Range:

BM25 scores are unbounded (not normalized to 0-100)
Minimum score: 0.0 (no match or zero term frequency)
Maximum score: No fixed upper limit
Typical ranges:
- Single term match: 0.5 to 5.0
- Multi-term queries: 2.0 to 15.0 (sum of individual term scores)
- Rare terms score higher than common terms
- Scores depend on term frequency, document length, and corpus statistics

Tokenization

The TOKENIZE_TEXT_C / TOKENIZE_TEXT_V functions:

Splits on whitespace and punctuation (hyphens, apostrophes, periods, etc.)
Converts ASCII letters to lowercase (non-ASCII characters are preserved as-is)
Includes underscores as part of a token
Returns individual tokens with byte positions
Does not filter stop words — common words are indexed normally

Inverted Index

Search is fast because we pre-compute an inverted index:

term → [document_id, term_frequency]

This allows quick lookup of which documents contain which terms.

Uninstalling

To completely remove the full-text search system from your Teradata instance:

bteq < sql/99_uninstall.sql

Or run manually in BTEQ:

.run file = sql/99_uninstall.sql

This will remove:

All stored procedures (CREATE_FTS_INDEX_C, CREATE_FTS_INDEX_V, DROP_FTS_INDEX, REBUILD_FTS_INDEX_C, REBUILD_FTS_INDEX_V, INDEX_SINGLE_DOCUMENT_C, INDEX_SINGLE_DOCUMENT_V, DELETE_SINGLE_DOCUMENT, UPDATE_SINGLE_DOCUMENT_C, UPDATE_SINGLE_DOCUMENT_V, SEARCH_FTS)
All views (v_fts_indexes, v_fts_indexes_current, v_fts_index_columns)
All UDFs (TOKENIZE_TEXT_C, TOKENIZE_TEXT_V, BM25_SCORE)
All metadata tables (fts_indexes, fts_term_index, fts_document_stats, fts_term_stats)
Global temporary tables (fts_temp_docs_c, fts_temp_docs_v)
The entire fts_metadata database

Warning: This operation is permanent and cannot be undone. All indexed data will be lost. Your source tables are not affected.

Limitations

No stop word filtering: Common words (the, a, is, etc.) are indexed and matched like any other term
No phrase matching: Searches for individual terms, not exact phrases
No wildcards: Exact term matching only
Manual rebuild: Must call REBUILD_FTS_INDEX_C or REBUILD_FTS_INDEX_V after data changes
UTF-8 validation: Documents with invalid UTF-8 are automatically skipped during indexing
Teradata 17: SEARCH_FTS stored procedure not available (use direct SQL patterns 1-3 instead)

Support

For questions or issues:

Review sql/examples.sql for working code
Check Teradata logs for detailed error messages
Verify prerequisites and permissions

Version

Check the filename of your distribution package for version information: teradata-fulltext-YYYY-MM-version.zip

Table of Contents

Teradata Full-Text Search

Teradata Version Support

Prerequisites

BTEQ Installation

Database Access

System Requirements

What Gets Installed

Deployment

Step 1: Extract the Package

Step 2: Configure Credentials

Step 3: Run Deployment

Deploying to Teradata 17

Verification

Usage

Important: Quoted vs Unquoted Identifiers

Index Management

Searching

Query Syntax

What Counts as a Token

Multi-Word Queries — OR Semantics

What Is NOT Supported

Examples

Handling Real-World Data Issues

Invalid UTF-8 Characters

Performance Tips

Examples

Troubleshooting

How It Works

Uninstalling

Limitations

Support

Version