Use of UNICODE in Spike2

Discussions and questions about topics not covered above

How do you use non-ASCII characters in Spike2

Never and I never will
0
No votes
Never, but it would be useful if I could
1
50%
Only in comments for scripts and output sequences
0
No votes
In script comments and in script literals that are written to the log view
0
No votes
Gererally, including in channel information and user-defined dialogs and toolbars
1
50%
 
Total votes: 2

User avatar
Greg Smith
Major contributor
Posts: 1634
Joined: 19 Jun 2008, 12:27
Software used: Spike2 and Signal
1401 type: Many 1401 types
Location: Cambridge, England
Contact:

Use of UNICODE in Spike2

Unread postby Greg Smith » 30 Jul 2014, 16:30

Edit: This post refers to a time before we implemented UNICODE in Spike2 version 8. Both Spike2 and Signal now use the Unicode character set; the transition seems to have been pretty painless (at least we have not had any complaints!)

Currently Spike2 (and Signal) deal with text internally as 8-bit ASCII characters. The characters with codes 0-127 have fixed meanings, and characters with codes 128-255 have meanings that depend on the "code page" that is set in the operating system. These extra characters in the code page allow you to use non-ASCII national characters in scripts for comments and in literal strings ("string"). However, if text that is encoded for one code page is sent to a user using a different one, the result is a mess (except for ASCII characters in the range 0-127). Worse, if you use a languge that requires many thousand characters (Chinese or Japanese, for example), you have little hope of success.

We are experimenting with changing over to using a UNICODE-based system (an international standard system that allows around 1 million different characters). If we do this, then users in China will be able to type comments and strings in scripts in Chinese and if they send such a script to me in England, the characters will still be correct (as long as I install the right language support), regardless of my local code page. This will also allow the use of special characters (pi, the degree sign, etc).

It is relatively easy for us to allow you to type comments into your scripts in your local language, as the text editor we use can work with UTF-8 as well as with the local code page. Simple minded use of this also allows us to put such text into script strings, and the text will display correctly if the output is sent to the log view, for example. However, if you use this to set a channel title, for example, this will display as rubbish characters in a time view. To get these to display correctly means a vastly bigger task of converting the entire program to use UNICODE.

The reason I am writing this is to ask you if you are already making use of the local code page to encode text other than ASCII characters. For example, if you are in Europe, are you using non-ASCII national characters (ø, à and the like)? If you are in Japan, are you writing using ひらがな or in 日本語? Do you write comments in Chinese or Korean? It may be that these characters do not display in your browser...

If no-one is doing this, it would allow us to switch very quickly to using UTF-8 encoding for the script system (so you can write comments in your local language system), with a slower transition to full UNICODE mode throughout the program. However, if a lot of people are already using code page based national characters, particularly if they are using them in time and result views and in tool bars and in user-defined dialogs, then we cannot make this change until the entire program is converted.

Please respond to the poll to let us know what you do (or would like to do).
Greg Smith Cambridge Electronic Design

Pawel Kusmierek
Major contributor
Posts: 1413
Joined: 02 Jul 2008, 13:23
Software used: Spike2
1401 type: Power1401 mk II
Location: Georgetown University, Washington DC, USA
Contact:

Re: Use of UNICODE in Spike2

Unread postby Pawel Kusmierek » 30 Jul 2014, 16:51

I replied "Never but it would be useful", meaning rather "might turn out to be useful".

I mean that I never do it and I am not planning to, but it may happen that I'll have to run a script on a machine which uses Unicode paths (e.g., user account name) and then I'd need to handle such paths.
Paweł Kuśmierek

Spike2 Power 1401 Mk II

User avatar
Greg Smith
Major contributor
Posts: 1634
Joined: 19 Jun 2008, 12:27
Software used: Spike2 and Signal
1401 type: Many 1401 types
Location: Cambridge, England
Contact:

Re: Use of UNICODE in Spike2

Unread postby Greg Smith » 30 Jul 2014, 18:35

I should have mentioned that if we do the entire job of converting, then unicode path names to files are also allowed. Other uses for US users would be the ability to use any unicode symbol on a label or button (which I have been asked for).
Greg Smith Cambridge Electronic Design

Marin Manuel
Major contributor
Posts: 138
Joined: 02 Jul 2008, 15:21
Software used: Spike2 and Signal
1401 type: Power1401
Location: Chicago, IL
Contact:

Re: Use of UNICODE in Spike2

Unread postby Marin Manuel » 31 Jul 2014, 18:58

I tend to write eveything in english in Spike2 because of this limitation. However, because I work in a non-english speaking country, we have occasional problems with accented characters in path names and such, so a move to unicode would be great for us.

User avatar
Greg Smith
Major contributor
Posts: 1634
Joined: 19 Jun 2008, 12:27
Software used: Spike2 and Signal
1401 type: Many 1401 types
Location: Cambridge, England
Contact:

Re: Use of UNICODE in Spike2

Unread postby Greg Smith » 01 Aug 2014, 09:38

When we make the transition there will be some discomfort for people who have used national characters in data files and resources. Scripts should be OK as (in most cases) I can tell the difference between a script that holds national characters using the current code page and one that holds UTF-8 or even wide characters and I can translate them. This will mean that each script and include file will need opening and then saving to achieve the translation.

However, doing the detection and translation for every string everywhere, may not be feasible. We will not know until we get there. One possibility would be to write an application to "fix" strings in data files. This will always be possible for the new 64-bit son files as the string storage is not constrained to fixed lengths in the same way as it is for the 32-bit system. National characters will take more space when stored in UTF-8 than they do using code pages, so it is possible that in 32-bit files, "fixing" strings may result in truncated strings. Similarly, it may be possible to write an application to "fix" XML files by detecting if the included strings hold illegal UTF-8 sequences, and if they do, assume that these are code page characters and translate them.

Anyhow, don't hold your breath for this. It is a HUGE task to migrate a program designed for the ASCII character set to UNICODE while keeping everything the same as far as users are concerned. It may have to wait for us to entirely drop support for the old binary resource files. The change to XML resources was partly so we could make this transition as the old binary resources use fixed length ASCII strings. We changed to XML while still reading the old format in version 7.11 in February 2013, so folks can always use the latest version 7 release to translate old binary resources to XML even if we stop doing this in version 8.
Greg Smith Cambridge Electronic Design

User avatar
Greg Smith
Major contributor
Posts: 1634
Joined: 19 Jun 2008, 12:27
Software used: Spike2 and Signal
1401 type: Many 1401 types
Location: Cambridge, England
Contact:

Re: Use of UNICODE in Spike2

Unread postby Greg Smith » 22 Oct 2014, 18:21

We seem to have got this all to work. Version 8.03 will be a Unicode version. If you do not make use of any character codes above 0x7f, that is you only use a-z, A-Z, 0-9, Control characters, space ! " # $ % & ` ( ) * + , - . ? : ; < = > ? [ \ ] ^ _ ' { | } and ~, then you will see no difference and any files you write from Spike2 will be as compatible with version 7 as ones written by version 8.02e.

This works because we use UTF-8 for external file storage of text, and characters with codes 0x00 to 0x7f in UTF-8 are identical to ASCII.

If you have used characters with other codes (for example accented letters in European languages or uses an Asian character set through the Windows code page), you should find that Spike2 reads your old files, detects that they do not contain UTF-8 codes and then converts the characters into Unicode assuming that they are in the current code page.

The only problem comes about when you save new files that contain character codes greater than 0x7f. These will now be saved as UTF-8. If you then try to read these files with a non-Unicode version of Spike2, the files will read OK, but any extended characters will become gibberish. This does not render the file unuseable... just the text characters with codes above 0x7f will be unreadable. Basically, this means that if you want to write files that can be read by users of Spike2 version 7 or earlier, you must restrict yourself to the ASCII character set.

We are making this change for two reasons. Firstly, the development tools we use are transitioning to Unicode only, so our hand is forced, at least in the middle to long term. Secondly, we want to make it much easier for people to use our tools with their own language for annotations and comments. We have no immediate plans for changing the menus and dialogs to use multiple languages, though that is now entirely possible (though we would likely need user input on what each menu item should be in your language!) We do not have support for right-to-left languages.

Even if you will always use ASCII characters for normal writing, using Unicode allows a wide variety of other characters to be used in your files and for toolbars and in user-defined dialogs, for example greek letters to label EEG power bands and so on.

Version 8.03 has many other nice features, including making it much easier to work with large numbers of channels, support for initialising arrays with values in the script language, and support for generating "heat maps" fast enough to be used in real time!
Attachments
winsp399.gif
Animated map of EEG activity generated from Spike2 images
winsp399.gif (287.99 KiB) Viewed 894 times
Greg Smith Cambridge Electronic Design


Return to “General”

Who is online

Users browsing this forum: No registered users and 1 guest

cron