1. What is a better input for Word2Vec?

  2. What algorithm can help me discover synonyms?
  3. Estimating data set size for grammar extraction

  4. averaging multiple scores on small chunks of data or raw score on single collated data

  5. How to compute document similarities in case of source codes?

  6. Implementation of LDA (Latent Dirichlet Allocation) for classification tasks

  7. What distance should I use for edges weights in textrank algorithm
  8. How to calculate lexical cohension and semantic informaticveness for a given dataset?

  9. Need help/direction in an NLP Task ( Question Answering ) Model
  10. Can LSTM have a confidence score for each word predicted?
  11. Looking for Databases with gender of names and Ethnicity information

  12. Sentence similarity prediction
  13. What affect will replacing words with bigrams have on TfIDF?

  14. Organization of layers in Keras for a NLP problem

  15. Detecting Offensive Text Content in English and German

  16. How to get the tagset for hindi pos tagging?

  17. How to detect if one tweet is agreeing with another

  18. Memory efficient softmax over large number of classes

  19. Eliminate low quality predictions in a classification task

  20. How do I learn encoding of a text that is encoded at character level?

  21. Which algorithm Doc2Vec uses?
  22. Word2Vec - CBOW and Skip-Grams

  23. Confusion about Keras' skipgram and sampling table utilities
  24. Is it possible to have variable window size for Continuous Bag of Words method of training word embeddings?

  25. What is a lower bound on the vocabulary size for generating word/sentence embedding using word2vec or skip thought vectors?

  26. Data amount for a very simple chatbot
  27. How are dynamic memory networks employed in sequence to sequence modelling
  28. How to print out to a file using Standford Classifier

  29. How do NLP tokenizers handle hashtags?

  30. How to read Feature Based Grammar from a string

  31. How do I return Doc2Vec vectors of a corpus after training it using a pre-trained model?

  32. Public dataset for news articles with their associated categories

  33. Proper/Possible methods for extracting unstructured data from websites
  34. How to fix these vanishing gradients?

  35. Alternatives to TF-IDF and Cosine Similarity when comparing documents of differing formats
  36. Classify phrases as biomedical or non-biomedical

  37. How can I run Labeled LDA over one textual document?

  38. Automatic question categorization when we know important words in each category
  39. What is the difference between Deep Structured Semantic Model, or more general, Deep Semantic Similarity Model (DSSM) and Siamese networks

  40. How to create domain rules from raw unstructured text using NLP and deep learning?

  41. Using LSTM to clear up corrupted text files

  42. How to get relevancy score of a term with respect to text/document

  43. Disambiguating different Dates / section headers within text

  44. How to add extra word features other then word Embedding in Recurrent Neural Network model
  45. How to use word embedding vectors along with other features in a machine learning model?

  46. Make a chatbot using slack
  47. Feature selection - how to go about it using restaurant reviews.
  48. Data Mining - Intent matching and classification of text

  49. Matching the distribution on Sequence length for an RNN

  50. Training data for multi-category classification algorithm

  51. How to collect tweets by geo-location?

  52. Improve the speed of t-sne implementation in python for huge data

  53. Is there "Attention Is All You Need" implementation in Keras?

  54. Word2Vec embeddings with TF-IDF

  55. Is there a process flow to follow for text analytics?
  56. Are there any tools to make text labeling faster?
  57. How to change default values of ANNIE resources in GATE from java code?

  58. Bag of words and word2vec clarifications

  59. Word2Vec: Using pre-trained models

  60. A Text Sections Classifier
  61. Word2Vec, softmax function

  62. Accuracy of word and sent tokenize versus custom tokenizers in nltk

  63. How can I create a "trained" dataset for categorizing news articles?

  64. State of the art approaches for Information retrieval tasks based on deep learning

  65. Need help in improving accuracy of text classification using Naive Bayes in nltk for movie reviews
  66. Using several documents with word2vec
  67. NLP grouping word categories

  68. Supervised learning on sources of information with different importance

  69. How to find categorical features from a vector representation of text?

  70. Non-brute force approach to finding permissible English word anagrams
  71. Help debugging XGBoost prediction function
  72. Resolving time in NLP
  73. Using HashingVectorizer for text vectorization
  74. Using Vowpal Wabbit for NER

  75. Type of model for space-characters recovery?
  76. Document classification - optimal classifier & embedding

  77. Rank terms in a bag -of-words model

  78. Using NLP to detect insurance Fraud

  79. How to semantically compare a paragraph to a collection of documents?

  80. What tools are available for programming language parsing for ML?

  81. Giving higher priority to certain inputs in SKLearn Random Forest
  82. How to identify sentiment of a given word from a sentence
  83. Attention Methods
  84. data preprocessing for recovery symbols

  85. What approach is to be taken to convert a code snippet to simple English?

  86. How to initialize a new word2vec model with pre-trained model weights?

  87. How to cluster sentences based on company names from a post(s) containing several company names using similarity metric.

  88. When to use cosine simlarity over Euclidean similarity
  89. Any research on segmentation of non-text contents out of (mostly) text-documents?

  90. Why labelled-LDA (LLDA) weakly assigns documents to topics

  91. Performance Metric for topic extraction when there is no ground truth
  92. I'm trying to classify text using google cloud NLP API, but my code returns a null, can someone explain why?
  93. Sentiment analysis using sources other than the IMDB data

  94. Hierarchical Softmax Probabilites Calculations
  95. Interdependent Classifiers

  96. So what's the catch with LSTM?

  97. What is the difference between word-based and char-based text generation RNNs?
  98. Help diagnosing model behavior

  99. Sub topics with Latent Dirichlet Allocation

  100. Should I use regex or machine learning?