docs: move udf sample code to docs #33819

danielclow · 2025-12-04T12:47:35Z

testing, do not merge

gemini-code-assist · 2025-12-04T12:47:56Z

Summary of Changes

Hello @danielclow, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to improve the organization and accessibility of User-Defined Function (UDF) sample code by relocating it from a testing-related directory to a dedicated documentation examples folder. This change ensures that the example code is directly associated with the documentation, making it easier for users to find and utilize while learning about UDFs. The corresponding documentation files have been updated to correctly reference these newly located examples.

Highlights

UDF Example Relocation: User-Defined Function (UDF) sample code has been moved from the test/cases/12-UDFs/sh/ directory to a new, dedicated docs/examples/udf/ directory.
Documentation Updates: The English and Chinese UDF documentation files (docs/en/07-develop/09-udf.md and docs/zh/07-develop/09-udf.md) have been updated to reflect the new paths for the included C and Python UDF examples, and direct GitHub links have been removed from the example titles.
New Example Files: Several C and Python UDF example files, along with compilation and environment preparation scripts, have been added to the new docs/examples/udf/ directory to support the updated documentation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the UDF example code by moving it into the docs/examples directory and updating the documentation to reference the new location. This is a good organizational change. I've reviewed the moved code files and found several issues, including critical security vulnerabilities like buffer overflows in C examples, and correctness bugs in Python examples. There are also issues with broken build scripts and other less severe problems. Please address these issues to ensure the example code is safe, correct, and easy to use.

gemini-code-assist · 2025-12-04T12:51:19Z

docs/examples/udf/gpd.c

+    memcpy(dbName, varDataVal(varDbname), varDataLen(varDbname));
+    memcpy(tblName, varDataVal(varTbname), varDataLen(varTbname));


There is a critical buffer overflow vulnerability here. dbName and tblName are fixed-size buffers of 256 bytes, but memcpy copies varDataLen(...) bytes without checking if the length exceeds the buffer size. This can lead to a buffer overflow if the database or table names are too long. Use a safe copy function like snprintf to prevent this.

snprintf(dbName, sizeof(dbName), "%.*s", (int)varDataLen(varDbname), varDataVal(varDbname)); snprintf(tblName, sizeof(tblName), "%.*s", (int)varDataLen(varTbname), varDataVal(varTbname));

gemini-code-assist · 2025-12-04T12:51:19Z

docs/examples/udf/extract_avg.c

+        char data[64];
+        memset(data, 0, 64);
+        memcpy(data, varDataVal(buf), varDataLen(buf));


There is a critical buffer overflow vulnerability here. The data buffer is allocated on the stack with a fixed size of 64 bytes, but memcpy copies varDataLen(buf) bytes into it. According to the documentation, the input varchar can be up to 128 bytes, which will overflow the buffer. You should use a safe copy mechanism, for example by using a correctly-sized buffer and snprintf.

char data[129]; snprintf(data, sizeof(data), "%.*s", (int)varDataLen(buf), varDataVal(buf));

gemini-code-assist · 2025-12-04T12:51:20Z

docs/examples/udf/pycumsum.py

+def reduce(datablock, buf):
+    (rows, cols) = datablock.shape()
+    state = pickle.loads(buf)
+    row = []
+    for i in range(rows):
+        for j in range(cols):
+            cell = datablock.data(i, j)
+            if cell is not None:
+                row.append(datablock.data(i, j))
+    if len(row) > 1:
+        new_state = np.cumsum(row)[-1]
+    else:
+        new_state = state
+    return pickle.dumps(new_state)


The logic in the reduce function is incorrect for calculating a cumulative sum.

It recalculates the sum of the current block (np.cumsum(row)[-1]) but does not add it to the existing state. The state from the previous block is ignored.

The condition if len(row) > 1 causes it to ignore data blocks that contain only a single non-null value.

The function should sum the values in the current block and add them to the state.

def reduce(datablock, buf): (rows, cols) = datablock.shape() state = pickle.loads(buf) for i in range(rows): for j in range(cols): cell = datablock.data(i, j) if cell is not None: state += cell return pickle.dumps(state)

gemini-code-assist · 2025-12-04T12:51:20Z

docs/examples/udf/max_vol.c

+  char *str = buf->buf + sizeof(float);
+  // copy to des
+  char *des = resultData->buf + sizeof(uint16_t);
+  strcpy(des, str);


Using strcpy here is unsafe and can lead to a buffer overflow. The source string str can be up to 256 bytes, but the create function statement in the documentation specifies binary(64). If the constructed string is longer than the output buffer size, strcpy will write out of bounds. Use a safe function like snprintf to prevent this.

snprintf(des, resultData->bufLen - sizeof(uint16_t), "%s", str);

gemini-code-assist · 2025-12-04T12:51:20Z

docs/examples/udf/bit_and.c

+  if (block->numOfCols < 2) {
+    udfError("block:%p, cols:%d needs to be greater than 2", block, block->numOfCols);
+    return TSDB_CODE_UDF_INVALID_INPUT;
+  }


The column count check is incorrect. The documentation states that the function should work for a single column, but the current check block->numOfCols < 2 causes it to fail with an error for one column. Additionally, the error message needs to be greater than 2 is confusing and inconsistent with the check. The check should be for at least one column.

if (block->numOfCols < 1) { udfError("block:%p, cols:%d needs to be greater than or equal to 1", block, block->numOfCols); return TSDB_CODE_UDF_INVALID_INPUT; }

gemini-code-assist · 2025-12-04T12:51:21Z

docs/examples/udf/gpd.c

+    char tblName[256] = {0};      
+    memcpy(dbName, varDataVal(varDbname), varDataLen(varDbname));
+    memcpy(tblName, varDataVal(varTbname), varDataLen(varTbname));
+    printf("%s, %s\n", dbName, tblName);    


The use of printf is generally discouraged in UDFs as it writes to the standard output of the UDF process, which may not be monitored. It's better to use the provided logging functions like udfTrace for debugging information.

udfTrace("%s, %s", dbName, tblName);

gemini-code-assist · 2025-12-04T12:51:21Z

docs/examples/udf/l2norm.c

+  }
+
+  double sumSquares = *(double*)interBuf->buf;
+  int8_t numNotNull = 0;


The variable numNotNull is incremented but its value is never used. This is dead code and should be removed.

gemini-code-assist · 2025-12-04T12:51:21Z

docs/examples/udf/compile_udf.sh

+rm -rf /tmp/udf/libbitand.so /tmp/udf/libsqrsum.so /tmp/udf/libgpd.so
+mkdir -p /tmp/udf
+echo "compile udf bit_and and sqr_sum"


The script has inconsistencies. It attempts to remove libsqrsum.so and the echo message mentions sqr_sum, but this UDF is not compiled in this script. The compiled UDF is l2norm. Please update the script to be consistent.

Suggested change

rm -rf /tmp/udf/libbitand.so /tmp/udf/libsqrsum.so /tmp/udf/libgpd.so

mkdir -p /tmp/udf

echo "compile udf bit_and and sqr_sum"

rm -rf /tmp/udf/libbitand.so /tmp/udf/libl2norm.so /tmp/udf/libgpd.so

mkdir -p /tmp/udf

echo "compile udf bit_and, l2norm and gpd"

gemini-code-assist · 2025-12-04T12:51:21Z

docs/examples/udf/prepare_pyudf.sh

+FILE=/usr/local/lib/libtaospyudf.so
+if [ ! -f "$FILE" ]; then
+    echo "$FILE does not exist."
+    apt install -y python3 python3-dev python3-venv


The command apt install is specific to Debian-based Linux distributions (like Ubuntu). For a general example, it would be helpful to mention this and perhaps suggest equivalent commands for other popular package managers like yum or dnf for Fedora/CentOS/RHEL.

gemini-code-assist · 2025-12-04T12:51:21Z

docs/examples/udf/gpd.c

+  SUdfColumnData *resultData = &resultCol->colData;
+  resultData->numOfRows = block->numOfRows;
+  for (int32_t i = 0; i < resultData->numOfRows; ++i) {
+    int64_t* calc_ts = (int64_t*)udfColDataGetData(block->udfCols[0], i);


The variable calc_ts is assigned a value but is never used. This is dead code and should be removed to improve clarity.

docs: move udf sample code to docs

125b915

danielclow requested review from guanshengliang and zitsen as code owners December 4, 2025 12:47

taosdata-bot bot added the internal label Dec 4, 2025

gemini-code-assist bot reviewed Dec 4, 2025

View reviewed changes

danielclow closed this Dec 5, 2025

danielclow reopened this Dec 5, 2025

github pr rerun checks

b104e05

danielclow closed this Dec 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: move udf sample code to docs #33819

docs: move udf sample code to docs #33819

danielclow commented Dec 4, 2025

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

gemini-code-assist bot Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		memcpy(dbName, varDataVal(varDbname), varDataLen(varDbname));
		memcpy(tblName, varDataVal(varTbname), varDataLen(varTbname));

docs: move udf sample code to docs #33819

docs: move udf sample code to docs #33819

Conversation

danielclow commented Dec 4, 2025

Uh oh!

gemini-code-assist bot commented Dec 4, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants