Features - Sablayrolles/debates GitHub Wiki

Debates wiki -- Features

Requirements

Need to run coreNLP server with this:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000

Features Extaraction

Segmentation of dicourse in sentences tabular

import sys
sys.path.append("..")

import my_coreNLP.parseNLP as parseNLP

import features.saveData as saveData
import features.computeFeatures as computeFeatures

NLP = parseNLP.StanfordNLP()
txt = {"num": 1, "question": 1, "edu": "My cat is eating the mouse!"} 
#num : number of EDU
#question : number of associate question
#edu : text

s = saveData.compute(txt, NLP)
f = computeFeatures.returnFeatures(s, ["as?", "as!", "nb1stPers", "nb2ndPers"])

print(f)

List of features

default

  • num : number of the EDU in corpus
  • edu : text of EDU
  • question : number of the question associate in the debate

optionnals

  • "as?" : return 1 if there is '?' character 0 else
  • "as!" : return 1 if there is '!' character 0 else
  • "as..." : return 1 if there is '...' character 0 else
  • "nb1stPers" return number of 1st singular and plural personal pronoum
  • "nb2ndPers" return number of 2nd singular and plural personal pronoum
  • "nb3rdSingPers" return number of 3rd singular personal pronoum
  • "nb3rdPluPers" return number of 3rd plural personal pronoum

Examples

Some of examples files in can help you to use this project

Files Description
example_features_extract.py Extraction of a feature from a sentence

Home wiki file : Home