Feature #19

Adding 16-bit Unicode support for Component Pascal identifiers

Added by J. Templ over 4 years ago. Updated over 3 years ago.

Status:ClosedStart date:10/23/2014
Priority:NormalDue date:
Assignee:J. Templ% Done:

100%

Category:-
Target version:1.7
Forum topic:

Description

BlackBox 1.6 already supports extended ASCII-characters from the ISO-Latin-1 subset of Unicode for Component Pascal identifiers. For the benefit of the Cyrillic or Greek community, for example, it is required to add 16-bit Unicode support. In order to simplify the required changes in the compiler and runtime system and for providing a compact encoding
of plain ASCII-identifiers the UTF-8 encoding shall be used for representing Unicode identifiers both
internally in the compiler and runtime system and externally in symbol and object files. This also has
the advantage that the symbol and object file formats stay compatible with BB 1.6 as long as plain ASCII
characters are used. Because Unicode characters may also exist in module names, which are mapped to file names, it is required to add Unicode support to file name handling in a number of modules.
Refers to CPC-1.7 change list items 1, 7, 13, 21, 22, 23.


Related issues

Related to BlackBox Component Builder - Bug #120: The interface is not showing for aliases Closed 08/03/2016
Related to BlackBox Component Builder - Bug #57: wrong encoding of "module not found" message Closed 05/29/2015
Related to BlackBox Component Builder - Bug #132: Trash in the definitions for extended records with unicod... Closed 09/28/2016

Associated revisions

Revision e906546f
Added by J. Templ about 4 years ago

Full Unicode supoport added for CP identifiers as proposed by Helmut Zinn in CPC-1.7rc4. Refs: #19.
The foundation is in modules Strings and Kernel as proposed by Helmut Zinn.
Strings.IsIdent optimized by JT for speeding up the CP compiler.
Procedure Kernel.GetModName added according to Trurl.
Procedure Kernel.LoadDll with ARRAY OF CHAR parameter as proposed by Helmut Zinn.
Kernel.SourcePos and DevDebug.SourcePos unified as proposed by Helmut Zinn.

Compiler, Linker, Loader and affected modules adapted according to Helmut Zinn. Some files also contain minor syntactical cleanups at unrelated places.

Changes by Josef Templ:
ASSERTs for Utf8-conversions added in Kernel, DevLinker, DevDependencies, and StdLoader.
DevLinker simplified by using an auxiliary procedure WriteUtf8.
The constant 256 replaced at several places by LEN or similar constructs.
The change color removed.
The language report in Docu/CP-Lang.odc adapted for using Unicode.
Minor improvements in System/Docu/Strings.odc.

Signed-off-by: Josef Templ <>

Revision 518e5225
Added by I. Denisov about 4 years ago

The script compiler updated according new Unicode version. Refs: #19

Revision 6126a3b9
Added by I. Denisov about 4 years ago

Utf8ToString upgraded according RFC3629 by Alexander Shiryaev. Refs: #19

Revision 2fe831f9
Added by J. Templ about 4 years ago

Docu added for Utf8-conversions. Refs: #19.
Docu aligned with implementation for ToUpper and ToLower.

Signed-off-by: Josef Templ <>

Revision 7c21eeed
Added by I. Denisov about 4 years ago

Utf8ToString upgraded according new Docu. One error fixed by Alexander Shiryaev. Refs: #19

Revision a2d3fa47
Added by I. Denisov about 4 years ago

Error fixed in StdLoader. Refs: #19

Revision 19bc1d2e
Added by I. Denisov about 4 years ago

Removed forced SHORTCHAR to CHAR conversion in case of "decode incomplete or error". Refs: #19

Revision ebf1efc0
Added by J. Templ about 4 years ago

Utf8-conversion without contents check as proposed by J. Templ. Refs: #19.
Docu for module Strings states explicitly that there are no contents checks in Utf-8 conversions.
Minor simplifications in StdLoader.
HostMenus.ReadCommandLine.CopyName uses module Strings for handling lower-/uppercase characters.
Kernel.LoaderHook parameter 'name' changed to 'utfName'.

Signed-off-by: Josef Templ <>

Revision ddad68b5
Added by I. Denisov about 4 years ago

Utf8-conversion with full format check as proposed by WenYing Luo. Refs: #19.

Revision 4eb79e61
Added by J. Templ about 4 years ago

Utf8-conversion without checking illegal Unicode characters according to vote. Refs: #19.
(see http://forum.blackboxframework.org/viewtopic.php?f=4&t=154)
Optimzation of error detection in 3-byte sequences according to WenYing Luo: testing for ch < 0 is not required.
Kernel.LoaderHook.ThisMod: parameter 'name' changed to ARRAY OF CHAR according to Helmut Zinn: this avoids some Utf8-conversions.

Signed-off-by: Josef Templ <>

Revision 9ba7ce64
Added by J. Templ about 4 years ago

Merge pull request #7 from BlackBoxCenter/issue-#19

Refs: #19.

Revision 4951f73b
Added by I. Denisov over 3 years ago

missing Utf-8 conversions added. Refs: #19.
As proposed by Ivan Denisov.

Signed-off-by: Josef Templ <>

Revision 706dcddf
Added by J. Templ over 3 years ago

Merge pull request #48 from BlackBoxCenter/issue-#19

missing Utf-8 conversions added. Refs: #19.

Revision d0df425e
Added by J. Templ over 3 years ago

missing Utf-8 conversions for TypeName added. Refs: #19.

Signed-off-by: Josef Templ <>

Revision 88682894
Added by J. Templ over 3 years ago

bug fix in Stores.ThisType for supporting Unicode module names. Refs: #19.

Signed-off-by: Josef Templ <>

Revision 56d42342
Added by J. Templ over 3 years ago

missing support for Unicode added. Refs: #19.
SqlControls uses a new maxBaseVersion similar to module Controls.
OleClient uses Utf8-conversion for Unicode CP identifiers in Model.link.

Signed-off-by: Josef Templ <>

Revision cd27b5aa
Added by J. Templ over 3 years ago

merge conflict with issue #19 resolved. Refs: #19.

Signed-off-by: Josef Templ <>

Revision 24af400c
Added by J. Templ over 3 years ago

merge conflict with master resolved. Refs #19.

Signed-off-by: Josef Templ <>

Revision 4a467842
Added by J. Templ over 3 years ago

Merge pull request #56 from BlackBoxCenter/issue-#19

Refs: #19.

Revision 60685de0
Added by I. Denisov over 2 years ago

Searching for Unicode identifiers fixed. Refs: #19

Revision 0d821d65
Added by I. Denisov over 2 years ago

Change Kernel to Strings in the fix of the SearchIdent procedure. Refs: #19

Revision dbca13ea
Added by I. Denisov over 2 years ago

Extra comment about issue-#19 removed. Refs: #19

Revision c87ffcbd
Added by J. Templ over 2 years ago

Unicode support for DevProfiler improved. Refs: #19.
In addition, Strings.IsIdentStart and Strings.IsAlpha optimized for ASCII.

Signed-off-by: Josef Templ <>

Revision 57ed1853
Added by J. Templ over 2 years ago

missing Utf-8 conversion added to CheckModName. Refs: #19.

Signed-off-by: Josef Templ <>

Revision 2aefb3d9
Added by J. Templ over 2 years ago

array toUpper fixed for CHR. Refs: #19.

History

#1 Updated by J. Templ about 4 years ago

  • Description updated (diff)
  • % Done changed from 0 to 50

#2 Updated by J. Templ about 4 years ago

  • Subject changed from Adding full Unicode support for Component Pascal identifiers to Adding 16-bit Unicode support for Component Pascal identifiers
  • Description updated (diff)

#3 Updated by I. Denisov about 4 years ago

  • Status changed from New to In Progress
  • % Done changed from 50 to 70

According the today de facto standart of UTF-8.
http://tools.ietf.org/html/rfc3629

any valid UTF-8 string should match the next "Syntax of UTF-8 Byte Sequences":

   UTF8-octets = *( UTF8-char )
   UTF8-char   = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
   UTF8-1      = 00X-7FX
   UTF8-2      = C2X-DFX UTF8-tail
   UTF8-3      = E0X A0X-BFX UTF8-tail / E1X-ECX 2( UTF8-tail ) / EDX 80X-9FX UTF8-tail / EEX-EFX 2( UTF8-tail )
   UTF8-4      = F0X 90X-BFX 2( UTF8-tail ) / F1X-F3X 3( UTF8-tail ) /  F4X 80X-8FX 2( UTF8-tail )
   UTF8-tail   = 80X-BFX

Alexander Shiryaev made the algorithm that doing conversion from UTF8 to Unicode checking validity of the input.
http://forum.oberoncore.ru/viewtopic.php?f=127&p=89571#p89571

#4 Updated by I. Denisov about 4 years ago

Helmut found one error!

Helmut wrote:

Dear BlackBox User,

two days ago I received an e-mail from Hans Klaver:

Dear Helmut,
Today I downloaded BlackBox 1.7-RC4 for Windows.
In case you do not know already: the BB Logo is missing from the

About BlackBox dialog and from the Guided Tour document.

Bye,
Hans Klaver

Without his feedback I did not know that the uploaded version had an error.
I immediately rollback to the last known good version and search for the
error.

I catch the error with the following steps:
1. select the word DevCPM.LogWStr
2. Info -> Search in Source
3. klick on the link Dev/Mod/CPP.odc
4. Dev -> Compile
command error: cannot load module DevCompiler

Notes:
The error is reported form the StdInterpreter.ShowLoaderResult via
CannotLoaderResult.
The module DevCompiler exist and after restart of BlackBox I can
compile. So what happens?

I found the error in module StdLoader
PROCEDURE (h: Hook) ThisMod (IN name: ARRAY OF SHORTCHAR):
Kernel.Module;

at line
VAR m: Kernel.Module; ms: ModSpec; n: Kernel.Name; res: INTEGER;

The variable definition res: INTEGER; must be deleted.

The correct line is
VAR m: Kernel.Module; ms: ModSpec; n: Kernel.Name;

Please have a look at Help -> About BlackBox

Version 1.7-RC4 Built number 10 on 20.10.2014 is OK
Version 1.7-RC4 Built number 11 on 30.10.2014 is erroneous
Version 1.7-RC4 Built number 15 on 11.11.2014 is OK

I apologize the inconvenience you with my fault.

With best regards

Helmut Zinn

#5 Updated by I. Denisov about 4 years ago

  • % Done changed from 70 to 80

luowy suggested better converter, that is also handle surrogates and return res in clever way.

StdCoder.Decode ..,, ..fv....3QwdONl9RhOO9vRbf9b8R7fJHPNGomCrlAyIhgs,CbKBhZ
 xi2,CoruKu4qouqm8rtuGfa4.hOO9vRb1Y66wb8RTfQ9vQRtIdvPZHWKqtCa.E.U5Usp,6.5Qw
 dONlnayKmKKqCLLCJuGqayKm6F9vQ5nsH3.bnayKmKa2,Cor.kay4.qorGqmQCU2,CJuyKtQC9
 8P9PP7ONbXmb.2.AdAk5kUm.,6.k39.86.QC18RdfQHfMf9R9vQ7ONb1E.kHE.0.p.,6.jdLL3
 0EJYjyC.6.VQ.E4k.8Mtf.2.S02.e,2UgW.Ue.E.mP,UAU0IkmL,6.Y32.I16.j,6.J,U.YLk.
 0.85CE,9T3E.0.n00.p.0U.460.J,U.2GE4E.q,CE3U2V1w,61s.VU.64s.T.S.8E0E08Mtf.2
 .y20E.c4E.2E2.e0U.2Uw0e.8EOE.a78k8E.a,8k.E.U1o.2U5U3IkmL,6..EBU.YJ2.I3.,6.
 V2g0MR1U1A20k0u0I,QU,U.A2I.6.FR.QUDU.21gUdU1Y,MD6.1U.QUF2.0k7k0e,0kIE,O,2,
 ,E,4.0k7k,C,0E1k044Ck0E0G.CE,U8U.Y2QUZU.g2gUPU1Y0gUX,9.7.CE,k0a,0EJ2.5s16.
 d0zT1H6IZuH5OF7OJZOF,NJdfNl7JTvIdfQHfPDf8,78HeH,NRdfNldC,NEZeI1OK,tHB86b8G
 TeIduEFOEZuC,tHf8J,tQdfQp761eI.CIY42UmhgnJbUAdCZe3xc3JedQbBAV7QcDpdHZeUAhg
 ZhZxgVZh0BjohgUgbUAav2YoJipphXBgohgY3Yx2Yl2av2Ze2YmhgnhigZiUIZdgV7AV1,Oqo8
 rtGLEqHE0nR0Gu4qomKEqHE4nRWGJ0mtGrkGrmemIqk4ak2OpU8JEWLK0momGEeKK0mq4KweHE
 aIb.rN1HM0HsMFfC,tIF0UBUnZZUQimIbUAdC,2YcIZUQC66JN8PU7Yiu2Y7,.Grka43PSdPNb
 96JN8P.TvON76bPRZfQp763uHT8H9OERuCH66Fd8,tQffQyqn4KuKKE.qk2aEfEIeGcKIcCHQC
 HJam4aU7Igppgu2Y,,CHEyIX0md..ohg2YhJbUAdC,g,g,3OFDOGR86ZPN0GRqHE0nRqk2ako0
 GRqHE66J96pND,,PPMl96pND,7H9eHFtQdfQH76P76XtC,tQ,dCvFnaKtsC,N1HM0j8GH8H986
 FNRdfNipoqJECGE0HgaGEOGEWGp0GS0mq4ad2Y2xdUgV7M05HEenSgiopAsCPM0akYOIECLEqH
 EO42YI3d3pdUgV7A,HcMQfkgfUIbxsMFvC,dP,dCvlMi1Z76pND01bPRZHEenSoc,ZdHhcv2YU
 gV7A,HsE1uI98659O,tHB86PM0AVw3Yl2fcIZk2feAZioZrocMJbUQioJiPJhR3Yug55nRAdCR
 ccIhdQbBU7YDVtEZ7KRd9V7FB8Kp76l96pNDyIdGIICKoaGEqGEAbmQbUQiUUoBgdtC,7R,dCv
 lMgV7k2m598Ale9R7A9eFleC,N1HU76S,dCw7.IamYav2YBU7MGQgc3Yx2Ykgck66d8GsQZ76M
 AU7MFNuIHeF,tM.78K,7J.ENin4a.HkWuIWin4a.Hkt0GR6R.EN.H6Tock2fio3B8BleC,N1M0
 W5w7.6BVtC,N1M0a2.B8A0Ge.UnQbUgV7k2gcA,.GXU..Gn4ak2A,9eH.HktUo,.bVnhCUIJeJ
 hcvgV7k2KIagcU2ZesMTfPbPRPPN,dNHHuCLu0mom46631,,M0CLu8rh.CIY8JI0HWCIM0HY0m
 J0mb8JWUdQbUAdC,0GtKqtUl.akWu2UAVBAV7M0THEenSY866PM0AV1,bfA,tHBO1HM0HM0tXu
 2Y7pcU2ZX3hUYbU2as2aMBZDJecQgc3Yy2YkIc43fd2YI3d3lriKEe1B0iX3pd2R5M0tnMeHEa
 IX.tV,3aMBZD,u1.....aEyIau2Y7p6ES6C.cDAb43fd22....aEyQau2Y7p6ESMCV7KH,cDI6
 ....U7YDddC,NG.tVs.UyEQOIgaGE....M09eH22M0HWjRBd8G9WBU76F9uEF7RHtC,7S,dC2j
 UIZUoao2Yf22Ud224HNWnR0m4k2A7GLEq1,7JF0k2MGQipVI3d3FIeGEC5y4.M06F6SN76X7AV
 7AV7GHtCPM0A,QC.Q6EQ0HMWIE6S,7FHeJ,7BV7AFGIemayIW0GO0HMIZdAZv22.P.H.b1.I8Q
 6s8MHT8F,,aWUA7UU2ZeghVBjWhgUIhyghV3jU2YeA3M0gcAl4ak2A,QC..lP8r76.g,A,KIbM
 1M0QiUUaltQ5k2gcAFE8quOqhuqi0GRsMMGcPHtC,tQI501k2gcC,AV3Z7WGJ0momKq.kt.A,B
 uHZ86P96pND22T86R96P76X767uHU7kYcO,7Dv76PPMl96d8GsQd1U1Vk.kbElKLnghRBZdQbU
 .J190,78J767uEl8K,76J,A,9eHg,A,ZPNb96MA0GWKoVWmoamRQiUUa,UUohjZiUQgjpBkomK
 q.dPMHHEUU.AV3,.U7pd13ZdB3PM0K2kYOYcQiUg5dPMAZa2Zi3Yy2YkAZUY868KLr8rmCrrmK
 vKKm0Gla5bf8HN1cF.24..k2a2J1.kt.kV....MGcO.00G2.MFEtK4MAq.90PU7luG566EEG3A
 dCR6ZPNb99,NAVN8,NFR8F0GI6RZPRUYd8kYcOuHEqqkW5..kR0Gp00qqkQbUgcCl4sQd1Uk2f
 vgV7gcC76f8RBHeQ8a4rN1HcUXDJ9X1xhiZimxhgZhZJinpZHZC58RZ9P7ONbvM,Mwd0.UiQcj
 pho,YcZRiX3.5011.85...CLL.U2V.IS2U.UIU.U76.2..AU0CyIVGhighgmRiiQ88pum470,M
 wd0UnpZGhighA70,cw5.0.LJ.w.QI2U.sU.ktumdsIdPSNPN7ONbH.4D.o3aLq.,cwFE.2..F.
 pG.2U.E,,.RNEd1K5GomCb.6,6..UYU.AU.U.UUQoOF.2Uwpr,6C5H.WnlM.E.cUZj0E..UO.,
 .1.eWwV.E.0t.U...Xi0...
 --- end of encoding ---

#6 Updated by I. Denisov about 4 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 80 to 100

Resolved and applied in master branch.

Final solution is using simple format check according the Center decision.

#7 Updated by I. Denisov over 3 years ago

  • Status changed from Closed to In Progress

I found, that Externalize & Internalize are not working correctly for the Views and Models from modules with Cyrillic identifiers.

#8 Updated by J. Templ over 3 years ago

  • Status changed from In Progress to Closed

#9 Updated by I. Denisov over 2 years ago

  • Related to Bug #120: The interface is not showing for aliases added

#10 Updated by I. Denisov over 2 years ago

  • Related to Bug #57: wrong encoding of "module not found" message added

#11 Updated by I. Denisov over 2 years ago

  • Related to Bug #132: Trash in the definitions for extended records with unicode identifiers added

Also available in: Atom PDF