Skip to content

Conversation

@IcyFeather233
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is the implementation of

CNCF - KubeEdge: Domain-specific large model benchmarks: the edge perspective (2025 Term 1)

This PR is related to #177

@kubeedge-bot kubeedge-bot added the kind/feature Categorizes issue or PR as related to a new feature. label May 6, 2025
@kubeedge-bot kubeedge-bot requested review from MooreZheng and hsj576 May 6, 2025 14:36
@kubeedge-bot kubeedge-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 6, 2025
@IcyFeather233 IcyFeather233 changed the title LFX: Implementation of [LFX] Domain-specific large model benchmarks: the edge perspective May 6, 2025
Copy link
Collaborator

@MooreZheng MooreZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks fine to me so far. A few suggestions:

  1. An issue is needed, showing the need to add a module of pre-process into the single task learning scheme and its consequences to community members (i.e., should be fine since the pre-process module could be skipped).
  2. Take the 1-2-3-4 dimensions out from the data of each query and make the dimension information as metadata under specific directories, in order to make the dataset simple.
  3. Use English after all contented fixed. Besides, there could also be method to ineterate both English and Chinese version for readme (see this link).
  4. Try to fix the CI issue

Further reviews could be taken after demonstrations with experiments.

@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign moorezheng after the PR has been reviewed.
You can assign the PR to them by writing /assign @moorezheng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@IcyFeather233
Copy link
Contributor Author

image
It seems that this CI / pylink (3.9) (pull_request) is not caused by my code change.

@AryanNanda17
Copy link
Contributor

AryanNanda17 commented Jun 5, 2025

@IcyFeather233, you can try syncing your fork's main branch with ianvs main branch. PR #213 which is recently merged, corrects the error.

@kubeedge-bot kubeedge-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 12, 2025
@IcyFeather233
Copy link
Contributor Author

@MooreZheng @hsj576 Hi please review this PR

Copy link
Collaborator

@MooreZheng MooreZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good now. Some tiny comments on the routine meeting

  1. Update the Kaggle dataset for the embedding
  2. Take a look at the preprocess issue and add advices about Sedna version in this PR if needed. If it is about other documents like quick start, advices could be added using another PR.
  3. Currently the PR has 9 commits and squash the commits into one.

@kubeedge-bot kubeedge-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 20, 2025
@kubeedge-bot kubeedge-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 16, 2025
@kubeedge-bot kubeedge-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 22, 2025
@kubeedge-bot kubeedge-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 22, 2025
@IcyFeather233
Copy link
Contributor Author

IcyFeather233 commented Jul 22, 2025

  1. I have updated the kaggle dataset here, and related information are also added in the PR README file.
  2. The preprocess will not influence other examples, to use, just add preprocess function in BaseModel class like this:
@ClassFactory.register(ClassType.GENERAL, alias="gen")
class BaseModel:

    def __init__(self, **kwargs):
        ...

    def preprocess(self, **kwargs):
        # add your preprocess before train
        self.rag = GovernmentRAG(model_name="/path/to/models/bge-m3", device="cuda", persist_directory="./chroma_db")
        LOGGER.info("RAG initialized")


    def train(self, train_data, valid_data=None, **kwargs):
        ...
        
    def save(self, model_path):
        ...

    def predict(self, data, input_shape=None, **kwargs):
        ...

    def load(self, model_url=None):
        ...

    def evaluate(self, data, model_path, **kwargs):
        ...

If it is not needed, do not add this function is ok, because it is written like this:

def _preprocess(self, job):
        if job.preprocess() is None:
            return None
        return job.preprocess()

So it will not influence previous examples.
3. All commits have been merged into one

@IcyFeather233 IcyFeather233 requested a review from MooreZheng July 22, 2025 07:15
Copy link
Collaborator

@MooreZheng MooreZheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 30, 2025
@MooreZheng
Copy link
Collaborator

This is from the LFX mentorship Term 2 and the PR looks good to me now. We need another lgtm from @hsj576 :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants