-
Notifications
You must be signed in to change notification settings - Fork 83
Implementation of [LFX] Domain-specific large model benchmarks: the edge perspective #196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
MooreZheng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks fine to me so far. A few suggestions:
- An issue is needed, showing the need to add a module of pre-process into the single task learning scheme and its consequences to community members (i.e., should be fine since the pre-process module could be skipped).
- Take the 1-2-3-4 dimensions out from the data of each query and make the dimension information as metadata under specific directories, in order to make the dataset simple.
- Use English after all contented fixed. Besides, there could also be method to ineterate both English and Chinese version for readme (see this link).
- Try to fix the CI issue
Further reviews could be taken after demonstrations with experiments.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
|
|
@IcyFeather233, you can try syncing your fork's main branch with ianvs main branch. PR #213 which is recently merged, corrects the error. |
|
@MooreZheng @hsj576 Hi please review this PR |
MooreZheng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good now. Some tiny comments on the routine meeting
- Update the Kaggle dataset for the embedding
- Take a look at the preprocess issue and add advices about Sedna version in this PR if needed. If it is about other documents like quick start, advices could be added using another PR.
- Currently the PR has 9 commits and squash the commits into one.
Signed-off-by: IcyFeather <[email protected]>
@ClassFactory.register(ClassType.GENERAL, alias="gen")
class BaseModel:
def __init__(self, **kwargs):
...
def preprocess(self, **kwargs):
# add your preprocess before train
self.rag = GovernmentRAG(model_name="/path/to/models/bge-m3", device="cuda", persist_directory="./chroma_db")
LOGGER.info("RAG initialized")
def train(self, train_data, valid_data=None, **kwargs):
...
def save(self, model_path):
...
def predict(self, data, input_shape=None, **kwargs):
...
def load(self, model_url=None):
...
def evaluate(self, data, model_path, **kwargs):
...If it is not needed, do not add this function is ok, because it is written like this: def _preprocess(self, job):
if job.preprocess() is None:
return None
return job.preprocess()So it will not influence previous examples. |
MooreZheng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
This is from the LFX mentorship Term 2 and the PR looks good to me now. We need another lgtm from @hsj576 :D |

What type of PR is this?
/kind feature
What this PR does / why we need it:
This is the implementation of
CNCF - KubeEdge: Domain-specific large model benchmarks: the edge perspective (2025 Term 1)
This PR is related to #177