Tuesday, 28 May 2013

How can I 'compile' LaTeX snippets to (unicode) plain text?

How can I 'compile' LaTeX snippets to (unicode) plain text?

I have a bunch of LaTeX snippets which I'd like to convert, as faithfully as possible, to Unicode strings.
(In fact, these are title of papers from a bibliographic database, which I'd like to use as filenames.)
Can anyone suggest how to 'compile' a snippet of LaTeX into plain text?
Here are some examples:
{\it {A}rithm\^etik\^e stoichei\^osis}꞉ on {D}iophantus and {H}ero of {A}lexandria
On a geometry of {I}vanov and {S}hpectorov for the {O}'{N}an sporadic simple group
On a theorem of {P}l\"unnecke concerning the sum of a basis and a set of positive density
On some series containing {$\psi(x)-\psi(y)$} and {$(\psi(x)-\psi(y))^2$} for certain values of {$x$} and {$y$}
My primary concerns are essentially just converting accented characters to unicode correctly and removing superfluous braces. I don't care about preserving formatting (e.g. \it above), and I'm happy leaving $ delimited math as is.
I'm interested in solutions that use TeX itself to do the conversion, as well as informed suggestions about doing the translation 'by hand' in some other language. Even removing braces, without destroying $ delimited math, seems tricky.

No comments:

Post a Comment