Skip to content

sriks987/UnReady

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

UnReady

Main Tasks

  • Referencing Flow of Conversation using Usernames
  • Topic Clustering

Ideas

  1. Getting subject of a sentence - possibly LDA (https://towardsdatascience.com/nlp-extracting-the-main-topics-from-your-dataset-using-lda-in-minutes-21486f5aa925)
  2. Likelihood of it being a reply is higher if pronoun is used
  3. Message Gap = Particular reference to message -> Probability decreases exponentially
  4. Context to map which message to reply to -> subject of the conversation - map it to the closest entity by default
    • Direct word matches give a higher probability
  5. Single Word - Response to message
  6. 1v1 Conversations - based on threshold - always same topic
  7. Parallel Models - Word Similarity, Sentence Similarity and Reference Matching

Columns of Dataset

  1. Gap between messages
  2. TimeStamp
  3. Unread/Read
  4. Username

Preprocessing

  1. Removal of Stopwords
    • Word Similarity - No Removal
    • Sentence Similarity - Removal Needed
    • Reference Matching - No Removal
  2. Tokenisation and Lemmatisation

Parsing

  1. Extracting Whatsapp Messages

Assumptions

  1. Reply feature (used in common messaging applications) is not applicable
  2. Language used not that of villager's.

About

Text Segregation On Asynchronous Group Chat

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •