Skip to content

Conversation

@danielclow
Copy link
Contributor

testing, do not merge

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @danielclow, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to improve the organization and accessibility of User-Defined Function (UDF) sample code by relocating it from a testing-related directory to a dedicated documentation examples folder. This change ensures that the example code is directly associated with the documentation, making it easier for users to find and utilize while learning about UDFs. The corresponding documentation files have been updated to correctly reference these newly located examples.

Highlights

  • UDF Example Relocation: User-Defined Function (UDF) sample code has been moved from the test/cases/12-UDFs/sh/ directory to a new, dedicated docs/examples/udf/ directory.
  • Documentation Updates: The English and Chinese UDF documentation files (docs/en/07-develop/09-udf.md and docs/zh/07-develop/09-udf.md) have been updated to reflect the new paths for the included C and Python UDF examples, and direct GitHub links have been removed from the example titles.
  • New Example Files: Several C and Python UDF example files, along with compilation and environment preparation scripts, have been added to the new docs/examples/udf/ directory to support the updated documentation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the UDF example code by moving it into the docs/examples directory and updating the documentation to reference the new location. This is a good organizational change. I've reviewed the moved code files and found several issues, including critical security vulnerabilities like buffer overflows in C examples, and correctness bugs in Python examples. There are also issues with broken build scripts and other less severe problems. Please address these issues to ensure the example code is safe, correct, and easy to use.

Comment on lines +38 to +39
memcpy(dbName, varDataVal(varDbname), varDataLen(varDbname));
memcpy(tblName, varDataVal(varTbname), varDataLen(varTbname));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical buffer overflow vulnerability here. dbName and tblName are fixed-size buffers of 256 bytes, but memcpy copies varDataLen(...) bytes without checking if the length exceeds the buffer size. This can lead to a buffer overflow if the database or table names are too long. Use a safe copy function like snprintf to prevent this.

    snprintf(dbName, sizeof(dbName), "%.*s", (int)varDataLen(varDbname), varDataVal(varDbname));
    snprintf(tblName, sizeof(tblName), "%.*s", (int)varDataLen(varTbname), varDataVal(varTbname));

Comment on lines +67 to +69
char data[64];
memset(data, 0, 64);
memcpy(data, varDataVal(buf), varDataLen(buf));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a critical buffer overflow vulnerability here. The data buffer is allocated on the stack with a fixed size of 64 bytes, but memcpy copies varDataLen(buf) bytes into it. According to the documentation, the input varchar can be up to 128 bytes, which will overflow the buffer. You should use a safe copy mechanism, for example by using a correctly-sized buffer and snprintf.

        char data[129];
        snprintf(data, sizeof(data), "%.*s", (int)varDataLen(buf), varDataVal(buf));

Comment on lines +16 to +29
def reduce(datablock, buf):
(rows, cols) = datablock.shape()
state = pickle.loads(buf)
row = []
for i in range(rows):
for j in range(cols):
cell = datablock.data(i, j)
if cell is not None:
row.append(datablock.data(i, j))
if len(row) > 1:
new_state = np.cumsum(row)[-1]
else:
new_state = state
return pickle.dumps(new_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The logic in the reduce function is incorrect for calculating a cumulative sum.

  1. It recalculates the sum of the current block (np.cumsum(row)[-1]) but does not add it to the existing state. The state from the previous block is ignored.
  2. The condition if len(row) > 1 causes it to ignore data blocks that contain only a single non-null value.

The function should sum the values in the current block and add them to the state.

def reduce(datablock, buf):
    (rows, cols) = datablock.shape()
    state = pickle.loads(buf)

    for i in range(rows):
        for j in range(cols):
            cell = datablock.data(i, j)
            if cell is not None:
                state += cell

    return pickle.dumps(state)

char *str = buf->buf + sizeof(float);
// copy to des
char *des = resultData->buf + sizeof(uint16_t);
strcpy(des, str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Using strcpy here is unsafe and can lead to a buffer overflow. The source string str can be up to 256 bytes, but the create function statement in the documentation specifies binary(64). If the constructed string is longer than the output buffer size, strcpy will write out of bounds. Use a safe function like snprintf to prevent this.

  snprintf(des, resultData->bufLen - sizeof(uint16_t), "%s", str);

Comment on lines +13 to +16
if (block->numOfCols < 2) {
udfError("block:%p, cols:%d needs to be greater than 2", block, block->numOfCols);
return TSDB_CODE_UDF_INVALID_INPUT;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The column count check is incorrect. The documentation states that the function should work for a single column, but the current check block->numOfCols < 2 causes it to fail with an error for one column. Additionally, the error message needs to be greater than 2 is confusing and inconsistent with the check. The check should be for at least one column.

  if (block->numOfCols < 1) {
    udfError("block:%p, cols:%d needs to be greater than or equal to 1", block, block->numOfCols);
    return TSDB_CODE_UDF_INVALID_INPUT;
  }

char tblName[256] = {0};
memcpy(dbName, varDataVal(varDbname), varDataLen(varDbname));
memcpy(tblName, varDataVal(varTbname), varDataLen(varTbname));
printf("%s, %s\n", dbName, tblName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of printf is generally discouraged in UDFs as it writes to the standard output of the UDF process, which may not be monitored. It's better to use the provided logging functions like udfTrace for debugging information.

    udfTrace("%s, %s", dbName, tblName);

}

double sumSquares = *(double*)interBuf->buf;
int8_t numNotNull = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable numNotNull is incremented but its value is never used. This is dead code and should be removed.

Comment on lines +3 to +5
rm -rf /tmp/udf/libbitand.so /tmp/udf/libsqrsum.so /tmp/udf/libgpd.so
mkdir -p /tmp/udf
echo "compile udf bit_and and sqr_sum"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script has inconsistencies. It attempts to remove libsqrsum.so and the echo message mentions sqr_sum, but this UDF is not compiled in this script. The compiled UDF is l2norm. Please update the script to be consistent.

Suggested change
rm -rf /tmp/udf/libbitand.so /tmp/udf/libsqrsum.so /tmp/udf/libgpd.so
mkdir -p /tmp/udf
echo "compile udf bit_and and sqr_sum"
rm -rf /tmp/udf/libbitand.so /tmp/udf/libl2norm.so /tmp/udf/libgpd.so
mkdir -p /tmp/udf
echo "compile udf bit_and, l2norm and gpd"

FILE=/usr/local/lib/libtaospyudf.so
if [ ! -f "$FILE" ]; then
echo "$FILE does not exist."
apt install -y python3 python3-dev python3-venv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The command apt install is specific to Debian-based Linux distributions (like Ubuntu). For a general example, it would be helpful to mention this and perhaps suggest equivalent commands for other popular package managers like yum or dnf for Fedora/CentOS/RHEL.

SUdfColumnData *resultData = &resultCol->colData;
resultData->numOfRows = block->numOfRows;
for (int32_t i = 0; i < resultData->numOfRows; ++i) {
int64_t* calc_ts = (int64_t*)udfColDataGetData(block->udfCols[0], i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable calc_ts is assigned a value but is never used. This is dead code and should be removed to improve clarity.

@danielclow danielclow closed this Dec 5, 2025
@danielclow danielclow reopened this Dec 5, 2025
@danielclow danielclow closed this Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants