txt2graph - generates a dot-file from text-dependencies
try http://andreas-romeyke.de/txt2graph/, the changelog is on http://andreas-romeyke.de/txt2graph/CHANGES,
latest tarball is on http://andreas-romeyke.de/txt2graph/txt2graph1.2.3.tar.gz
txt2graph visualized the structure and dependencies of a text readed from stdin and generates a dot-File for graphviz to stdout.
You need already installed GraphViz tools, check http://www.graphviz.org or http://www.research.att.com/sw/tools/graphviz/
You need the perl-module GraphViz.pm by Leon Brocard (since version 1.2). You can get it from CPAN.org, simply by entering CPAN-shell via 'perl -MCPAN -e shell'
and install it by typing 'install GraphViz'
, the GraphViz needs additional modules, too.
txt2graph should visualisize the structure and dependencies of a text, because I needed it to analyze and understand the principles how my own text-compressor should work for good results. With txt2graph it is possible to find the weights in a document. This tool is also useful to compare documents in a similar way a ngram-tool will do. With usage of filterfile- and frequency-option you get little "topic"-maps of documents.
If you find another usage, please contact me.
txt2graph reads a text document from stdin, removes all non-alphas and generates an array (list) of words. Then it converts german-umlauts, because graphviz can only handle clean ASCII as node-description and output a dot-file for a directed or an undirected graph.
The generated dot-file will be typically 4-5fold of original document. This means also that dot or neato consumes very much time.
txt2graph (c)2001-2003 by Andreas Romeyke (andreas.romeyke@web.de)
txt2graph is distributed under the terms of the GNU General Public License, you find a copy in this distribution in file COPYING or at http://www.gnu.org.
Please check twice if you want a special license for commercial use and contact me with signed email (signed with PGP or GnuPG!) and the subject "txt2graph license".
A special greeting goes to my friend-girl Maren for her patience, to Matthias Richter for his ideas, Leon Brocard for his GraphViz-Module and to Derek Jones for his bugreports. Last but not least a greeting goes to the team around the GraphViz-tools.
perl txt2graph.pl --help
perldoc txt2graph.pl
cat mytext.txt|perl txt2graph.pl --undirect > mytext.dot
neato -Tps mytext.dot > mytext.ps
Or quite simpler (since version 1.2):
cat mytext.txt| perl txt2graph.pl --undirect --format=ps > mytest.ps
cat mytext.txt|perl txt2graph.pl --undirect > mytext.dot
dot -Tps mytext.dot > mytext.ps
or quite simpler (since version 1.2):
cat mytext.txt| perl txt2graph.pl --undirect --format=png > mytext.png
cat mytext.txt|perl txt2graph.pl --direct > mytext.dot
dot -Tps mytext.dot > mytext.ps
cat mytext.txt| perl txt2graph.pl --direct --format=svg --layout=twopi > mytext.svg
(since version 1.2)
perl --help | more
txt2graph.pl --infile=TODO --concentrate --layout=dot --format=ps --zeroword --records --casesensitive --minwordlen=3 --remove-multiples --outfile=test.ps
It reads the file 'TODO', builds an PostScript-file 'test.ps'. The nodes were clustered, that means strong related words were nearby. The graph is undirected, multiple links were supressed and filtered words were substituted by a special node. The words must be greater than 3 characters an will be used in casesensitive way
There were following known bugs:
older version of GraphViz coredump if nodescount is high. This is not a failure in this script, but has an effect on it. Use '--format=canon'
to recheck if this script fails too.
Of course, there are some additional bugs, but not known yet. If you detect one, please do not hesitate to contact me at andreas.romeyke@web.de
check extra file http://andreas-romeyke.de/txt2graph/TODO. If you have suggestions, or patches (prefered), contact me, too.
... coming soon ...