Install Cabocha 0.68 and perform dependency analysis in Python
Have Mecab installed https://code.google.com/p/mecab/downloads/list
Here, it is assumed that mecab-0.996.exe is installed in UTF-8.
Download cabocha-0.68.exe http://code.google.com/p/cabocha/downloads/list
Execute the downloaded EXE. At this time, the character code to be selected should be the same as the character code of Mecab.
Make it possible to execute cabocha through "C: \ Program Files (x86) \ CaboCha \ bin" in the path of the environment variable. It is also needed for python to access the dll.
Confirm execution Create a UTF8 file called input.txt, enter the character string you want to analyze, and execute the following from the command prompt.
cabocha < input.txt > out.txt
If it can be analyzed properly, the following file will be output.
here---D
Marisa's-D
It's a slow place!
EOS
Note that the reason why the file is routed here is that UTF-8 cannot be handled at the command prompt.
This is Marisa's slow place!
EOS
If this happens, the character code of input.txt may not be utf-8. (Note that the default is ANSI when created with Notepad)
In addition, the following error may occur.
svm.cpp(140) [version == MODEL_VERSION] incompatible version: 101
svm.cpp(751) [size >= 2] dep.cpp(79) [!failed] no such file or directory: C:\Program Files (x86)\CaboCha\etc\..\model\dep.ipa.model
In this case, the version of cabocha has not been upgraded properly, so delete the following folder.
C:\Users\User name\AppData\Local\VirtualStore\Program Files(x86)\CaboCha
Download cabocha-0.68.tar.bz http://code.google.com/p/cabocha/downloads/list This file can be decompressed with Lhaplus etc.
Move the current directory to the python folder in the unzipped folder and execute the following command.
python setup.py install
Traceback (most recent call last):
  File "setup.py", line 13, in <module>
    version = cmd1("cabocha-config --version"),
  File "setup.py", line 7, in cmd1
    return os.popen(str).readlines()[0][:-1]
IndexError: list index out of range
This happens because cabocha-config is not installed on Windows
Change before
#!/usr/bin/env python
from distutils.core import setup,Extension,os
import string
def cmd1(str):
    return os.popen(str).readlines()[0][:-1]
def cmd2(str):
    return string.split (cmd1(str))
setup(name = "cabocha-python",
	version = cmd1("cabocha-config --version"),
	py_modules=["CaboCha"],
	ext_modules = [
		Extension("_CaboCha",
			["CaboCha_wrap.cxx",],
			include_dirs=cmd2("cabocha-config --inc-dir"),
			library_dirs=cmd2("cabocha-config --libs-only-L"),
			libraries=cmd2("cabocha-config --libs-only-l"))
			])
Rewrite version and the contents of ext_modules with the installed information.
After change
#!/usr/bin/env python
from distutils.core import setup,Extension,os
import string
def cmd1(str):
    return os.popen(str).readlines()[0][:-1]
def cmd2(str):
    return string.split (cmd1(str))
setup(name = "cabocha-python",
	version = "0.68",
	py_modules=["CaboCha"],
	ext_modules = [
		Extension("_CaboCha",
			["CaboCha_wrap.cxx",],
			include_dirs=[r"C:\Program Files (x86)\CaboCha\sdk"],
			library_dirs=[r"C:\Program Files (x86)\CaboCha\sdk"],
			libraries=['libcabocha'])
])
python setup.py install
#!/usr/bin/python
# -*- coding: utf-8 -*-
import CaboCha
# c = CaboCha.Parser("");
c = CaboCha.Parser("")
sentence = "Return the hat"
#print c.parseToString(sentence)
#tree =  c.parse(sentence)
#
tree =  c.parse(sentence)
print tree.toString(CaboCha.FORMAT_TREE)
print tree.toString(CaboCha.FORMAT_LATTICE)
#print tree.toString(CaboCha.FORMAT_XML)
for i in range(tree.chunk_size()):
    chunk = tree.chunk(i)
    print 'Chunk:', i
    print ' Score:', chunk.score
    print ' Link:', chunk.link
    print ' Size:', chunk.token_size
    print ' Pos:', chunk.token_pos
    print ' Head:', chunk.head_pos #Head
    print ' Func:', chunk.func_pos #Function words
    print ' Features:',
    for j in range(chunk.feature_list_size):
        print '  ' + chunk.feature_list(j) 
    print
    print 'Text' 
    for ix  in range(chunk.token_pos,chunk.token_pos + chunk.token_size):
      print ' ', tree.token(ix).surface 
    print
for i in range(tree.token_size()):
    token = tree.token(i)
    print 'Surface:', token.surface
    print ' Normalized:', token.normalized_surface
    print ' Feature:', token.feature
    print ' NE:', token.ne #Named entity
    print ' Info:', token.additional_info
    print ' Chunk:', token.chunk
    print
Hat-D
return
EOS
* 0 1D 0/1 0.000000
Hat noun,General,*,*,*,*,hat,Bow,Boshi
Particles,Case particles,General,*,*,*,To,Wo,Wo
* 1 -1D 0/0 0.000000
Verb to return,Independence,*,*,Godan / Sa line,Uninflected word,return,Kaes,Kaes
EOS
Chunk: 0
 Score: 0.0
 Link: 1
 Size: 2
 Pos: 0
 Head: 0
 Func: 1
 Features:   FCASE:To
  FHS:hat
  FHP0:noun
  FHP1:General
  FFS:To
  FFP0:Particle
  FFP1:Case particles
  FFP2:General
  FLS:hat
  FLP0:noun
  FLP1:General
  FRS:To
  FRP0:Particle
  FRP1:Case particles
  FRP2:General
  LF:To
  RL:hat
  RH:hat
  RF:To
  FBOS:1
  GCASE:To
  A:To
Text
hat
To
Chunk: 1
 Score: 0.0
 Link: -1
 Size: 1
 Pos: 2
 Head: 0
 Func: 0
 Features:   FHS:return
  FHP0:verb
  FHP1:Independence
  FHF:Uninflected word
  FFS:return
  FFP0:verb
  FFP1:Independence
  FFF:Uninflected word
  FLS:return
  FLP0:verb
  FLP1:Independence
  FLF:Uninflected word
  FRS:return
  FRP0:verb
  FRP1:Independence
  FRF:Uninflected word
  LF:return
  RL:return
  RH:return
  RF:return
  FEOS:1
  A:Uninflected word
Text
return
Surface:hat
 Normalized:hat
 Feature:noun,General,*,*,*,*,hat,Bow,Boshi
 NE: None
 Info: None
 Chunk: <CaboCha.Chunk; proxy of <Swig Object of type 'CaboCha::Chunk *' at 0x0274A170> >
Surface:To
 Normalized:To
 Feature:Particle,Case particles,General,*,*,*,To,Wo,Wo
 NE: None
 Info: None
 Chunk: None
Surface:return
 Normalized:return
 Feature:verb,Independence,*,*,Godan / Sa line,Uninflected word,return,Kaes,Kaes
 NE: None
 Info: None
 Chunk: <CaboCha.Chunk; proxy of <Swig Object of type 'CaboCha::Chunk *' at 0x0274A170> >
        Recommended Posts