-
Notifications
You must be signed in to change notification settings - Fork 5k
docs: move udf sample code to docs #33819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @danielclow, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request aims to improve the organization and accessibility of User-Defined Function (UDF) sample code by relocating it from a testing-related directory to a dedicated documentation examples folder. This change ensures that the example code is directly associated with the documentation, making it easier for users to find and utilize while learning about UDFs. The corresponding documentation files have been updated to correctly reference these newly located examples. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the UDF example code by moving it into the docs/examples directory and updating the documentation to reference the new location. This is a good organizational change. I've reviewed the moved code files and found several issues, including critical security vulnerabilities like buffer overflows in C examples, and correctness bugs in Python examples. There are also issues with broken build scripts and other less severe problems. Please address these issues to ensure the example code is safe, correct, and easy to use.
| memcpy(dbName, varDataVal(varDbname), varDataLen(varDbname)); | ||
| memcpy(tblName, varDataVal(varTbname), varDataLen(varTbname)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a critical buffer overflow vulnerability here. dbName and tblName are fixed-size buffers of 256 bytes, but memcpy copies varDataLen(...) bytes without checking if the length exceeds the buffer size. This can lead to a buffer overflow if the database or table names are too long. Use a safe copy function like snprintf to prevent this.
snprintf(dbName, sizeof(dbName), "%.*s", (int)varDataLen(varDbname), varDataVal(varDbname));
snprintf(tblName, sizeof(tblName), "%.*s", (int)varDataLen(varTbname), varDataVal(varTbname));| char data[64]; | ||
| memset(data, 0, 64); | ||
| memcpy(data, varDataVal(buf), varDataLen(buf)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a critical buffer overflow vulnerability here. The data buffer is allocated on the stack with a fixed size of 64 bytes, but memcpy copies varDataLen(buf) bytes into it. According to the documentation, the input varchar can be up to 128 bytes, which will overflow the buffer. You should use a safe copy mechanism, for example by using a correctly-sized buffer and snprintf.
char data[129];
snprintf(data, sizeof(data), "%.*s", (int)varDataLen(buf), varDataVal(buf));| def reduce(datablock, buf): | ||
| (rows, cols) = datablock.shape() | ||
| state = pickle.loads(buf) | ||
| row = [] | ||
| for i in range(rows): | ||
| for j in range(cols): | ||
| cell = datablock.data(i, j) | ||
| if cell is not None: | ||
| row.append(datablock.data(i, j)) | ||
| if len(row) > 1: | ||
| new_state = np.cumsum(row)[-1] | ||
| else: | ||
| new_state = state | ||
| return pickle.dumps(new_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in the reduce function is incorrect for calculating a cumulative sum.
- It recalculates the sum of the current block (
np.cumsum(row)[-1]) but does not add it to the existingstate. The state from the previous block is ignored. - The condition
if len(row) > 1causes it to ignore data blocks that contain only a single non-null value.
The function should sum the values in the current block and add them to the state.
def reduce(datablock, buf):
(rows, cols) = datablock.shape()
state = pickle.loads(buf)
for i in range(rows):
for j in range(cols):
cell = datablock.data(i, j)
if cell is not None:
state += cell
return pickle.dumps(state)| char *str = buf->buf + sizeof(float); | ||
| // copy to des | ||
| char *des = resultData->buf + sizeof(uint16_t); | ||
| strcpy(des, str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using strcpy here is unsafe and can lead to a buffer overflow. The source string str can be up to 256 bytes, but the create function statement in the documentation specifies binary(64). If the constructed string is longer than the output buffer size, strcpy will write out of bounds. Use a safe function like snprintf to prevent this.
snprintf(des, resultData->bufLen - sizeof(uint16_t), "%s", str);| if (block->numOfCols < 2) { | ||
| udfError("block:%p, cols:%d needs to be greater than 2", block, block->numOfCols); | ||
| return TSDB_CODE_UDF_INVALID_INPUT; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column count check is incorrect. The documentation states that the function should work for a single column, but the current check block->numOfCols < 2 causes it to fail with an error for one column. Additionally, the error message needs to be greater than 2 is confusing and inconsistent with the check. The check should be for at least one column.
if (block->numOfCols < 1) {
udfError("block:%p, cols:%d needs to be greater than or equal to 1", block, block->numOfCols);
return TSDB_CODE_UDF_INVALID_INPUT;
}| char tblName[256] = {0}; | ||
| memcpy(dbName, varDataVal(varDbname), varDataLen(varDbname)); | ||
| memcpy(tblName, varDataVal(varTbname), varDataLen(varTbname)); | ||
| printf("%s, %s\n", dbName, tblName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| double sumSquares = *(double*)interBuf->buf; | ||
| int8_t numNotNull = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| rm -rf /tmp/udf/libbitand.so /tmp/udf/libsqrsum.so /tmp/udf/libgpd.so | ||
| mkdir -p /tmp/udf | ||
| echo "compile udf bit_and and sqr_sum" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script has inconsistencies. It attempts to remove libsqrsum.so and the echo message mentions sqr_sum, but this UDF is not compiled in this script. The compiled UDF is l2norm. Please update the script to be consistent.
| rm -rf /tmp/udf/libbitand.so /tmp/udf/libsqrsum.so /tmp/udf/libgpd.so | |
| mkdir -p /tmp/udf | |
| echo "compile udf bit_and and sqr_sum" | |
| rm -rf /tmp/udf/libbitand.so /tmp/udf/libl2norm.so /tmp/udf/libgpd.so | |
| mkdir -p /tmp/udf | |
| echo "compile udf bit_and, l2norm and gpd" |
| FILE=/usr/local/lib/libtaospyudf.so | ||
| if [ ! -f "$FILE" ]; then | ||
| echo "$FILE does not exist." | ||
| apt install -y python3 python3-dev python3-venv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| SUdfColumnData *resultData = &resultCol->colData; | ||
| resultData->numOfRows = block->numOfRows; | ||
| for (int32_t i = 0; i < resultData->numOfRows; ++i) { | ||
| int64_t* calc_ts = (int64_t*)udfColDataGetData(block->udfCols[0], i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing, do not merge