Teradata Full-Text Search
Production-ready full-text search for Teradata 17.x and 20.x with BM25 relevance scoring.
Teradata Version Support
This system supports both Teradata 17.x and 20.x with the following compatibility:
| Feature | Teradata 17.x | Teradata 20.x |
|---|---|---|
| TOKENIZE_TEXT_C / TOKENIZE_TEXT_V UDFs | ✅ Fully supported | ✅ Fully supported |
| BM25_SCORE UDF | ✅ Fully supported | ✅ Fully supported |
| Index Management Procedures | ✅ Fully supported | ✅ Fully supported |
| Direct SQL Search Queries | ✅ Fully supported | ✅ Fully supported |
| SEARCH_FTS Convenience Procedure | ❌ Not available | ✅ Available |
For Teradata 17 users: All core functionality works perfectly. The SEARCH_FTS procedure (Pattern 4 below) is not available due to TD 17 stored procedure limitations, but you can use direct SQL queries (Patterns 1-3) for the same functionality.
Prerequisites
BTEQ Installation
BTEQ (Basic Teradata Query) is required for deployment. The deployment scripts rely on BTEQ to execute all SQL commands.
Installing BTEQ:
- BTEQ is included in Teradata Tools and Utilities (TTU)
- Download TTU from Teradata's official website (requires Teradata account)
- Install the version appropriate for your operating system:
- Linux/Mac: Teradata Tools and Utilities for Linux/Mac
- Windows: Teradata Tools and Utilities for Windows
Verify Installation:
bteq
This should display BTEQ version information. If the command is not found, ensure TTU is properly installed and BTEQ is in your system PATH.
Network Configuration:
- BTEQ must have network access to your Teradata server
- Ensure firewall allows connections to Teradata (typically port 1025)
- Test connectivity before deployment
Database Access
You need:
- Access to a Teradata 17.x or 20.x instance
- Database credentials with CREATE DATABASE privilege
- The deployment automatically creates the schema and grants all necessary privileges
Database/Schema Creation:
The deployment script sql/00_create_schema.sql automatically creates the fts_metadata database (1GB permanent space). The user who creates the database automatically becomes the owner with all necessary privileges.
If automatic creation fails (insufficient privileges or DBA-managed databases), ask your DBA to create the database and grant you access:
-- Create the FTS metadata database (1GB permanent space)
CREATE DATABASE fts_metadata AS PERMANENT = 1e9;
-- Grant all privileges to the deployment user (only needed if DBA created the database)
GRANT ALL ON fts_metadata TO your_username;
Note: The database name fts_metadata is hardcoded in all stored procedures and should not be changed. All FTS components (tables, UDFs, procedures) are created in this schema.
System Requirements
- Operating System: Linux, macOS, or Windows
- Disk Space: Minimal (~50MB for package)
- Network: Access to Teradata server
What Gets Installed
All components are created in the fts_metadata schema:
Tables:
fts_indexes- Index metadata and statisticsfts_term_index- Inverted index (term → documents)fts_document_stats- Document lengthsfts_term_stats- Term frequencies
UDFs (User-Defined Functions):
TOKENIZE_TEXT_C(clob_text)- Tokenizes CLOB text of any size; handles invalid UTF-8 gracefully. Use this in all search queries.TOKENIZE_TEXT_V(varchar_text)- TokenizesVARCHAR(32000) CHARACTER SET UNICODEtext; ~3× faster, used internally by the_Vstored procedures. Not suitable for search queries — see note below.BM25_SCORE(...)- Calculates relevance scores
Stored Procedures:
Each bulk-indexing operation has two variants — _C (CLOB, any document size, handles bad UTF-8) and _V (VARCHAR ≤ 32000 chars, ~3× faster). Choose one and use it consistently for a given index.
CREATE_FTS_INDEX_C/CREATE_FTS_INDEX_V- Creates and populates an index from a source tableDROP_FTS_INDEX- Removes an index and all associated dataREBUILD_FTS_INDEX_C/REBUILD_FTS_INDEX_V- Rebuilds index after bulk data changesINDEX_SINGLE_DOCUMENT_C/INDEX_SINGLE_DOCUMENT_V- Adds or re-indexes a single documentDELETE_SINGLE_DOCUMENT- Removes a single document from the indexUPDATE_SINGLE_DOCUMENT_C/UPDATE_SINGLE_DOCUMENT_V- Re-indexes an existing document after a source-table updateSEARCH_FTS- Simplified search interface that populates a volatile table with results (TD 20 only)
Views:
v_fts_indexes- Lists all indexes (quotes stripped from identifiers)v_fts_indexes_current- Lists indexes in current database (quotes stripped)v_fts_index_columns- Normalized column list with individual rows per column
Deployment
Step 1: Extract the Package
Linux/Mac:
unzip teradata-fulltext-*.zip
cd teradata-fulltext-*/
Windows:
# Extract using Windows Explorer or 7-Zip, then:
cd teradata-fulltext-<version-number>
Step 2: Configure Credentials
Edit the BTEQ credentials in bteq/install.bteq:
Open
bteq/install.bteqin a text editorFind the
.logonline and replace placeholders:.logon <HOSTNAME>/<USERNAME>,<PASSWORD><HOSTNAME>- Your Teradata server hostname<USERNAME>- Your database username<PASSWORD>- Your database password
Find the
GRANT ALLline and replace<USERNAME>:GRANT ALL ON fts_metadata TO <USERNAME>;Save the file
Example:
.logon prod-teradata/myuser,mypassword
...
GRANT ALL ON fts_metadata TO myuser;
Note: Use the same username in both places.
Step 3: Run Deployment
Linux/Mac:
./deploy.sh
Windows:
deploy.bat
Manual (BTEQ):
bteq < bteq/install.bteq
Deploying to Teradata 17
If deploying to Teradata 17.x, you must copy the TD 17 header files before running deployment:
# From the project root directory
cp include/17.0/sqltypes_td.h udf_tokenize_text_c/sqltypes_td.h
cp include/17.0/sqltypes_td.h udf_tokenize_text_v/sqltypes_td.h
cp include/17.0/sqltypes_td.h udf_bm25_score/sqltypes_td.h
# Then deploy normally
./deploy.sh
Why is this needed?
- Teradata 17 and 20 have different internal type sizes (LOB_LOCATOR: 64 bytes vs 122 bytes)
- When UDFs are created, Teradata compiles the C source code on the server
- The
sqltypes_td.hheader file must match the target Teradata version
After deployment to TD 17:
- All UDFs and procedures will work (except SEARCH_FTS)
- Use Patterns 1-3 for searching (Pattern 4 requires TD 20+)
- To deploy to TD 20 later, restore the TD 20 headers first:
cp include/20.0/sqltypes_td.h udf_tokenize_text_c/sqltypes_td.h cp include/20.0/sqltypes_td.h udf_tokenize_text_v/sqltypes_td.h cp include/20.0/sqltypes_td.h udf_bm25_score/sqltypes_td.h
Verification
Check that deployment succeeded:
-- List indexes (should be empty initially)
SELECT * FROM fts_metadata.v_fts_indexes;
-- Verify UDFs exist
HELP FUNCTION fts_metadata.TOKENIZE_TEXT_C;
HELP FUNCTION fts_metadata.TOKENIZE_TEXT_V;
HELP FUNCTION fts_metadata.BM25_SCORE;
-- Verify procedures exist
HELP PROCEDURE fts_metadata.CREATE_FTS_INDEX_C;
HELP PROCEDURE fts_metadata.CREATE_FTS_INDEX_V;
Usage
Important: Quoted vs Unquoted Identifiers
Teradata distinguishes between quoted and unquoted identifiers:
- Unquoted identifiers (e.g.,
MyTable) are converted to UPPERCASE (MYTABLE) - Quoted identifiers (e.g.,
"MyTable") are case-sensitive
When calling FTS procedures, provide identifiers exactly as they should appear in SQL:
-- For standard uppercase tables/columns (most common):
CALL fts_metadata.CREATE_FTS_INDEX_C(
'my_index',
'mydb', -- Becomes MYDB
'employees', -- Becomes EMPLOYEES
'emp_id', -- Primary key column
'name', -- Becomes NAME
result_msg
);
-- For case-sensitive quoted identifiers:
CALL fts_metadata.CREATE_FTS_INDEX_C(
'my_index',
'"MyDB"', -- Case-sensitive "MyDB"
'"EmployeeData"',-- Case-sensitive "EmployeeData"
'"EmpId"', -- Case-sensitive primary key column
'"FullName"', -- Case-sensitive "FullName"
result_msg
);
-- Mixed (some quoted, some not):
CALL fts_metadata.CREATE_FTS_INDEX_C(
'my_index',
'prod_db', -- Becomes PROD_DB
'"MixedCase"', -- Case-sensitive "MixedCase"
'id', -- Primary key column
'description', -- Becomes DESCRIPTION
result_msg
);
Rule: Pass identifiers as they would appear in a SELECT statement. If your table was created as CREATE TABLE "MyTable", pass '"MyTable"' to the procedure.
Index Management
Index names are global. Index names must be unique across all schemas — there is no per-schema namespace. If you index two different tables, give each index a distinct name (e.g.
sales_products_idxandhr_employees_idxrather thanproducts_idxin both schemas).
-- Create an index — use _C for CLOB/large documents, _V for VARCHAR data (3× faster)
CALL fts_metadata.CREATE_FTS_INDEX_C(
'my_index',
DATABASE, -- Current database
'my_table',
'id', -- Primary key column (used to identify documents in search results)
'col1,col2,col3', -- Text columns to index
result_msg -- OUT parameter for status message
);
-- VARCHAR variant (text truncated to 32000 chars, ~3× faster):
-- CALL fts_metadata.CREATE_FTS_INDEX_V('my_index', DATABASE, 'my_table', 'id', 'col1,col2,col3', result_msg);
-- List all indexes
SELECT * FROM fts_metadata.v_fts_indexes;
-- Rebuild after data changes (use same variant as CREATE)
CALL fts_metadata.REBUILD_FTS_INDEX_C('my_index', result_msg);
-- CALL fts_metadata.REBUILD_FTS_INDEX_V('my_index', result_msg);
-- Add a newly inserted row to the index
CALL fts_metadata.INDEX_SINGLE_DOCUMENT_C('my_index', CAST(42 AS VARCHAR(512)), result_msg);
-- Refresh the index after a source-table UPDATE
CALL fts_metadata.UPDATE_SINGLE_DOCUMENT_C('my_index', CAST(42 AS VARCHAR(512)), result_msg);
-- Remove a deleted row from the index
CALL fts_metadata.DELETE_SINGLE_DOCUMENT('my_index', CAST(42 AS VARCHAR(512)), result_msg);
-- Drop an index entirely
CALL fts_metadata.DROP_FTS_INDEX('my_index', result_msg);
Searching
Always use
TOKENIZE_TEXT_Cin search queries, notTOKENIZE_TEXT_V.
TOKENIZE_TEXT_Vexpects its input to already beVARCHAR(32000) CHARACTER SET UNICODE— the format produced inside the_Vstored procedures when text is fetched fromfts_temp_docs_v. When called from a search query with an ordinary string literal (which is Latin/ASCII in a standard BTEQ session), the UDF receives the wrong byte layout and produces garbage tokens that match nothing in the index.
TOKENIZE_TEXT_Vis used exclusively byCREATE_FTS_INDEX_V,REBUILD_FTS_INDEX_V, and the other_Vprocedures. It is not exposed for direct use.
All examples below assume a source table and index like this:
CREATE TABLE mydb.products (
product_id INTEGER,
sku CHAR(20),
name VARCHAR(256),
description CLOB,
PRIMARY KEY (product_id)
);
CALL fts_metadata.CREATE_FTS_INDEX_C(
'products_idx',
'mydb',
'products',
'product_id',
'name,description',
result_msg
);
Pattern 1: Unranked search — fastest, no scoring
Returns matching rows from the source table. Use this when you need to find documents containing any of the search terms but don't need relevance ordering.
SELECT
p.product_id,
p.sku,
p.name,
p.description
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless headphones')) AS qt(token, pos)
INNER JOIN fts_metadata.fts_term_index ti
ON ti.term = qt.token
AND ti.index_name = 'products_idx'
INNER JOIN mydb.products p
ON CAST(p.product_id AS VARCHAR(512)) = ti.document_id
GROUP BY
p.product_id,
p.sku,
p.name,
p.description;
The GROUP BY collapses duplicates that arise when a document matches multiple
search terms.
Pattern 2: Ranked search — results ordered by relevance
Returns source table columns plus a BM25 relevance score. Higher scores mean
a stronger match. Use SAMPLE N to limit results.
SELECT
p.product_id,
p.sku,
p.name,
p.description,
scores.score
FROM (
SELECT
ti.document_id,
SUM(
fts_metadata.BM25_SCORE(
ti.term_frequency,
ds.doc_length,
idx.avg_doc_length,
ts.document_frequency,
idx.document_count
)
) AS score
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless headphones')) AS qt(token, pos)
INNER JOIN fts_metadata.fts_term_index ti
ON ti.term = qt.token
AND ti.index_name = 'products_idx'
INNER JOIN fts_metadata.fts_document_stats ds
ON ds.index_name = ti.index_name
AND ds.document_id = ti.document_id
INNER JOIN fts_metadata.fts_term_stats ts
ON ts.index_name = ti.index_name
AND ts.term = ti.term
CROSS JOIN fts_metadata.fts_indexes idx
WHERE idx.index_name = 'products_idx'
GROUP BY ti.document_id
) scores
INNER JOIN mydb.products p
ON CAST(p.product_id AS VARCHAR(512)) = scores.document_id
ORDER BY scores.score DESC
SAMPLE 10;
Joining on different primary key types
The index stores document_id as VARCHAR(512). The join condition must cast
your source table's primary key to the same type:
-- INTEGER primary key
ON CAST(p.product_id AS VARCHAR(512)) = scores.document_id
-- VARCHAR primary key
ON CAST(p.order_ref AS VARCHAR(512)) = scores.document_id
-- CHAR primary key (CAST preserves the fixed-length padding stored during indexing)
ON CAST(p.part_code AS VARCHAR(512)) = scores.document_id
Pattern 3: SEARCH_FTS procedure (TD 20 only) — simplest option
On Teradata 20, the SEARCH_FTS stored procedure handles the join internally
and writes results into a volatile table that persists for your session.
CALL fts_metadata.SEARCH_FTS(
'products_idx', -- index name
'wireless headphones', -- search terms
'mydb.products', -- source table in database.table format
'product_id', -- primary key column name
'product_id, sku, name, description',-- columns to return
10 -- maximum number of results
);
SELECT * FROM fts_search_results ORDER BY score DESC;
The volatile table fts_search_results contains all the columns you specified
plus a score column. You can query it like any other table for the rest of
your session:
-- Filter by minimum relevance
SELECT product_id, name, score
FROM fts_search_results
WHERE score > 2.0
ORDER BY score DESC;
Note: SEARCH_FTS is not available on Teradata 17. Use Pattern 1 or Pattern 2
on TD 17.
Query Syntax
The search query is a plain string of words. There is no special query language — no boolean operators, no field qualifiers, no wildcards. The same tokenizer that processes indexed text is applied to the query string, so the rules are identical.
What Counts as a Token
| Character type | Behaviour |
|---|---|
| Letters (A-Z, a-z) | Included, lowercased |
| Digits (0-9) | Included as-is |
Underscore (_) |
Included, treated as part of the token |
Hyphen (-), period (.), apostrophe ('), etc. |
Delimiters — split the token |
| Whitespace | Delimiter |
| Non-ASCII (UTF-8) | Included, case is NOT changed |
Examples of tokenization:
| Input | Tokens produced |
|---|---|
wireless headphones |
wireless, headphones |
e-mail address |
e, mail, address |
it's a B2B deal |
it, s, a, b2b, deal |
part_number XR-400 |
part_number, xr, 400 |
"quoted phrase" |
quoted, phrase |
100% |
100 |
No stop word filtering. Common words like the, a, is are indexed and searchable just like any other word. Queries containing these words will match more documents and score them lower (BM25 down-weights high-frequency terms automatically).
Multi-Word Queries — OR Semantics
Every term in the query is looked up independently. A document is included in results if it contains any of the query terms. Documents that match more terms, or match rarer terms, receive higher BM25 scores.
-- Matches documents containing "wireless" OR "headphones" (or both)
-- Documents matching both terms score higher
SELECT * FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless headphones')) AS t(token, pos);
-- Returns: 'wireless', 'headphones'
There is no way to require that all terms appear in a document — this is a BM25 scoring system, not a boolean filter. Use the score to control result quality rather than trying to enforce term presence.
What Is NOT Supported
| Feature | Workaround |
|---|---|
AND — require all terms |
Not supported. Filter by minimum score to approximate. |
NOT — exclude a term |
Not supported. Filter results in the outer query. |
"phrase" — exact phrase |
Not supported. Individual words are matched independently. |
wild* — prefix/wildcard |
Not supported. Exact tokens only. |
field:term — field-scoped |
Not supported. All indexed columns are searched together. |
| Minimum score threshold | Use WHERE score > N in Pattern 1/2, or HAVING in the subquery. |
Examples
Single-word search
-- All products mentioning "bluetooth"
SELECT ti.document_id
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('bluetooth')) AS qt(token, pos)
INNER JOIN fts_metadata.fts_term_index ti
ON ti.term = qt.token AND ti.index_name = 'products_idx';
Multi-word search (OR, ranked)
-- Products matching "wireless", "bluetooth", or "headphones" — best matches first
SELECT
p.product_id,
p.name,
scores.score
FROM (
SELECT
ti.document_id,
SUM(
fts_metadata.BM25_SCORE(
ti.term_frequency,
ds.doc_length,
idx.avg_doc_length,
ts.document_frequency,
idx.document_count
)
) AS score
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('wireless bluetooth headphones')) AS qt(token, pos)
INNER JOIN fts_metadata.fts_term_index ti
ON ti.term = qt.token AND ti.index_name = 'products_idx'
INNER JOIN fts_metadata.fts_document_stats ds
ON ds.index_name = ti.index_name AND ds.document_id = ti.document_id
INNER JOIN fts_metadata.fts_term_stats ts
ON ts.index_name = ti.index_name AND ts.term = ti.term
CROSS JOIN fts_metadata.fts_indexes idx
WHERE idx.index_name = 'products_idx'
GROUP BY ti.document_id
) scores
INNER JOIN mydb.products p
ON CAST(p.product_id AS VARCHAR(512)) = scores.document_id
ORDER BY scores.score DESC
SAMPLE 10;
Filtering weak matches by score
-- Only return results with a meaningful relevance score
SELECT p.product_id, p.name, scores.score
FROM ( ... ) scores
INNER JOIN mydb.products p ON CAST(p.product_id AS VARCHAR(512)) = scores.document_id
WHERE scores.score > 1.0
ORDER BY scores.score DESC;
Checking how a query is tokenized
-- Useful for debugging unexpected search results
SELECT token, pos
FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('your search query here')) AS t(token, pos)
ORDER BY pos;
Handling Real-World Data Issues
Invalid UTF-8 Characters
The indexing system automatically handles documents containing invalid UTF-8 byte sequences:
What happens during indexing:
- Documents are processed sequentially
- If invalid UTF-8 is detected during tokenization, that document is skipped
- Processing continues with remaining documents
- Final result message reports the count of skipped documents
Example:
CALL fts_metadata.CREATE_FTS_INDEX_C('my_index', DATABASE, 'my_table', 'id', 'text_column', result_msg);
-- Result: "Success: 1000 documents (5 skipped due to invalid UTF-8), 4874 terms"
What this means:
- 995 documents were indexed successfully
- 5 documents contained invalid UTF-8 and were skipped
- The index is usable and contains 4874 unique terms
Important notes:
- Skipped documents will not appear in search results
- No manual intervention is required
- If you need to index all documents, fix the UTF-8 encoding in your source data first
- Common causes: binary data in text fields, incorrect character set conversions, legacy encoding issues
Performance: The _C path processes approximately 7 documents per second (~2m 22s for 1000 docs). The _V path is ~3× faster (~45s for 1000 docs) when data is known to be clean UTF-8.
Performance Tips
- Use basic search when you don't need scoring - Much faster
- Adjust HAVING threshold (e.g., 0.5) to filter weak matches
- Use SAMPLE N instead of TOP N for better performance
- Index only searchable columns - Don't over-index
- Rebuild indexes periodically after bulk data changes
Examples
See sql/examples.sql for complete working examples including:
- Test data creation
- Index creation and management
- All three search patterns
- Tokenization examples
- Performance tips
Troubleshooting
Problem: "Database 'fts_metadata' does not exist" Solution: The schema is created automatically. If it fails:
CREATE DATABASE fts_metadata AS PERMANENT = 1e9;
Then redeploy.
Problem: "CREATE DATABASE failed - insufficient privileges" Solution: Your user needs CREATE DATABASE privilege. Contact your DBA to grant:
GRANT CREATE DATABASE TO your_username;
Or have the DBA create the database and grant you access:
CREATE DATABASE fts_metadata AS PERMANENT = 1e9;
GRANT ALL ON fts_metadata TO your_username;
Problem: "The user does not have CREATE FUNCTION access to database fts_metadata" Solution: After creating the database, grant yourself all privileges:
GRANT ALL ON fts_metadata TO your_username;
Then redeploy.
Problem: "CREATE TABLE/PROCEDURE/FUNCTION failed - access denied" Solution: Same as above - ensure you have all necessary privileges on the fts_metadata database:
GRANT ALL ON fts_metadata TO your_username;
Problem: "Stored Procedure Created with Warnings" Solution: This is normal for procedures that call other procedures. Safe to ignore.
Problem: Deployment fails partway through Solution: Drop everything and redeploy:
DELETE DATABASE fts_metadata;
DROP DATABASE fts_metadata;
Then run ./deploy.sh again.
Problem: Search returns no results Check:
- You are using
TOKENIZE_TEXT_Cin your query, notTOKENIZE_TEXT_V(see note in the Searching section) - Index exists:
SELECT * FROM fts_metadata.v_fts_indexes; - Index has data:
SELECT COUNT(*) FROM fts_metadata.fts_term_index WHERE index_name = 'your_index'; - Search terms are being tokenized:
SELECT * FROM TABLE(fts_metadata.TOKENIZE_TEXT_C('your search'));
Problem: Some documents don't appear in search results Possible causes:
- Documents were skipped due to invalid UTF-8 encoding (check the result message from CREATE_FTS_INDEX_C; the _V variant will fail rather than skip)
- Documents don't contain any of the search terms after tokenization (punctuation is stripped; hyphens split words, e.g. "e-mail" becomes "e" and "mail")
- Documents exist but their text columns are NULL or empty
How It Works
BM25 Scoring
BM25 (Best Matching 25) is a probabilistic ranking algorithm that ranks documents by relevance to a query. It improves on TF-IDF by:
- Limiting the impact of term frequency (saturation)
- Normalizing for document length
- Weighting rare terms higher than common ones
Score Range:
- BM25 scores are unbounded (not normalized to 0-100)
- Minimum score: 0.0 (no match or zero term frequency)
- Maximum score: No fixed upper limit
- Typical ranges:
- Single term match: 0.5 to 5.0
- Multi-term queries: 2.0 to 15.0 (sum of individual term scores)
- Rare terms score higher than common terms
- Scores depend on term frequency, document length, and corpus statistics
Tokenization
The TOKENIZE_TEXT_C / TOKENIZE_TEXT_V functions:
- Splits on whitespace and punctuation (hyphens, apostrophes, periods, etc.)
- Converts ASCII letters to lowercase (non-ASCII characters are preserved as-is)
- Includes underscores as part of a token
- Returns individual tokens with byte positions
- Does not filter stop words — common words are indexed normally
Inverted Index
Search is fast because we pre-compute an inverted index:
term → [document_id, term_frequency]
This allows quick lookup of which documents contain which terms.
Uninstalling
To completely remove the full-text search system from your Teradata instance:
bteq < sql/99_uninstall.sql
Or run manually in BTEQ:
.run file = sql/99_uninstall.sql
This will remove:
- All stored procedures (CREATE_FTS_INDEX_C, CREATE_FTS_INDEX_V, DROP_FTS_INDEX, REBUILD_FTS_INDEX_C, REBUILD_FTS_INDEX_V, INDEX_SINGLE_DOCUMENT_C, INDEX_SINGLE_DOCUMENT_V, DELETE_SINGLE_DOCUMENT, UPDATE_SINGLE_DOCUMENT_C, UPDATE_SINGLE_DOCUMENT_V, SEARCH_FTS)
- All views (v_fts_indexes, v_fts_indexes_current, v_fts_index_columns)
- All UDFs (TOKENIZE_TEXT_C, TOKENIZE_TEXT_V, BM25_SCORE)
- All metadata tables (fts_indexes, fts_term_index, fts_document_stats, fts_term_stats)
- Global temporary tables (fts_temp_docs_c, fts_temp_docs_v)
- The entire fts_metadata database
Warning: This operation is permanent and cannot be undone. All indexed data will be lost. Your source tables are not affected.
Limitations
- No stop word filtering: Common words (the, a, is, etc.) are indexed and matched like any other term
- No phrase matching: Searches for individual terms, not exact phrases
- No wildcards: Exact term matching only
- Manual rebuild: Must call REBUILD_FTS_INDEX_C or REBUILD_FTS_INDEX_V after data changes
- UTF-8 validation: Documents with invalid UTF-8 are automatically skipped during indexing
- Teradata 17: SEARCH_FTS stored procedure not available (use direct SQL patterns 1-3 instead)
Support
For questions or issues:
- Review
sql/examples.sqlfor working code - Check Teradata logs for detailed error messages
- Verify prerequisites and permissions
Version
Check the filename of your distribution package for version information:
teradata-fulltext-YYYY-MM-version.zip