Board index » cppbuilder » case insensitive wchars
|
Jonathan Benedicto
CBuilder Developer |
|
Jonathan Benedicto
CBuilder Developer |
case insensitive wchars2005-08-02 03:52:01 AM cppbuilder26 I'm writing a platform-independant function that compares two wchar_t strings together, with and without case sensitivity. But, I've run into a problem with trying to make the two strings lowercase so that I can compare them. How can I convert wchar_t strings to lowercase ? I'm using std::wstring to handle the strings. Jonathan |
| maeder
CBuilder Developer |
2005-08-02 04:39:28 AM
Re:case insensitive wchars
"Jonathan Benedicto" < XXXX@XXXXX.COM >writes:
QuoteI'm writing a platform-independant function that compares two QuoteBut, I've run into a problem with trying to make the two strings |
| Jonathan Benedicto
CBuilder Developer |
2005-08-02 04:46:47 AM
Re:case insensitive wchars
"Thomas Maeder [TeamB]" < XXXX@XXXXX.COM >wrote in message
QuoteHow are they encoded? QuoteNot at all. No software is able to do that correctly, because doing it I'm very sorry for being so ignorant about this. I'm just learning about Unicode. Jonathan {smallsort} |
| JF Jolin
CBuilder Developer |
2005-08-02 07:45:04 AM
Re:case insensitive wchars
Jonathan Benedicto < XXXX@XXXXX.COM >wrote:
QuoteI'm very sorry for being so ignorant about this. I'm just learning about But as the documentation said: For some locales, the lstrcmpi function may be insufficient. If this occurs, use CompareString to ensure proper comparison. -- JF Jolin |
| Jonathan Benedicto
CBuilder Developer |
2005-08-02 07:50:46 AM
Re:case insensitive wchars
"JF Jolin" < XXXX@XXXXX.COM >wrote in message
QuoteI am in the same situation. So take this reply for what it's worth. QuoteWin32 API lstrcmpi() can perform a comparison with no case sensitivity. |
| maeder
CBuilder Developer |
2005-08-02 01:39:52 PM
Re:case insensitive wchars
"Jonathan Benedicto" < XXXX@XXXXX.COM >writes:
Quote>How are they encoded? Quote>Not at all. No software is able to do that correctly, because doing "character" is overloaded, but you know what I mean :-) ). English texts are commonly written in the Latin alphabet. Which is one of the few alphabets used on this planet that distinguish between upppercase and lowercase characters (the other one I know is the cyrillic alphabet). To correctly perform the conversion to lowercase in German and French text (both typically written in the Latin alphabet), your software needs to understand the text. I'd assume that other languages would apply as well, but I can't tell for sure. So if you *know* that a certain text is in English, it may be safe to do the conversion to lowercase character per character, as typical functions do it. QuoteI'm very sorry for being so ignorant about this. I'm just learning about |
| maeder
CBuilder Developer |
2005-08-02 01:42:01 PM
Re:case insensitive wchars
JF Jolin < XXXX@XXXXX.COM >writes:
QuoteFor some locales, the lstrcmpi function may be insufficient. If this the text is required. |
| Jonathan Benedicto
CBuilder Developer |
2005-08-02 11:02:43 PM
Re:case insensitive wchars
"Thomas Maeder [TeamB]" < XXXX@XXXXX.COM >wrote in message
QuoteIt's hard to do anything useful with data whose meaning you don't class. QuoteI can't tell you what the best idea for you is. trying to handle the wchar myself. Jonathan |
| Duane Hebert
CBuilder Developer |
2005-08-03 12:00:36 AM
Re:case insensitive wchars
"Thomas Maeder [TeamB]" < XXXX@XXXXX.COM >wrote in message
QuoteJF Jolin < XXXX@XXXXX.COM >writes: We're currently working with this same concept and it's quite complex. The best we've found so far is to find a cross platform library (QString in our case) that handles Unicode and do the comparisons with these objects. But even this isn't perfect due to examples like the above. I have no idea how you would do this with anything from std c++. Dealing with std::string or wstring doesn't really work. You basically have to use something like UTF8 as an intermediate and then you have problems with normalization and such. As well as the fact that there can be more than one valid UTF8 encoding for the same unicode character. Then on top of that, not everyone uses UTF8. You have to deal with UTF16 and UCS2 for example. |
| Jonathan Benedicto
CBuilder Developer |
2005-08-03 12:05:45 AM
Re:case insensitive wchars
"Duane Hebert" < XXXX@XXXXX.COM >wrote in message
QuoteI have no idea how you would do this with anything from std c++. comparison, or as I have it now, make the case sensitive option use the tolower function. Jonathan |
| maeder
CBuilder Developer |
2005-08-03 12:22:07 AM
Re:case insensitive wchars
"Jonathan Benedicto" < XXXX@XXXXX.COM >writes:
QuoteI think that maybe I should use that open-source ICU library instead But this library deals with Unicode, not necessarily with wchar_t strings, if I understand www-306.ibm.com/software/globalization/icu/index.jsp correctly. I have the feeling that you are seeing an equivalence between wchar_t and Unicode that isn't there. wchar_t objects can be used to represent Unicode characters; but they can be used for other things as well. OTOH, Unicode characters can be represented by wchar_t objects, but there are other representations. On platforms where sizeof(wchar_t)==2, a representation different from wchar_t is likely to be more useful since 21 bits are required to represent all Unicode code points; if the set of Unicode characters to be represented doesn't exclude these characters (I think they are used in Thailand), you're probably better of with a 32bit character type. |
| JF Jolin
CBuilder Developer |
2005-08-03 12:25:57 AM
Re:case insensitive wchars
Thomas Maeder [TeamB] < XXXX@XXXXX.COM >wrote:
QuoteE.g. to tell if MASSE and Maße should compare equal, understanding of What about synonym ? Fruit and color orange are two different realities. This is endless... -- JF Jolin |
| Jonathan Benedicto
CBuilder Developer |
2005-08-03 12:28:57 AM
Re:case insensitive wchars
"Thomas Maeder [TeamB]" < XXXX@XXXXX.COM >wrote in message
QuoteI have the feeling that you are seeing an equivalence between wchar_t QuoteOn platforms where sizeof(wchar_t)==2, a representation different from 32-bit character sizes ? Jonathan |
| Hendrik Schober
CBuilder Developer |
2005-08-03 01:32:02 AM
Re:case insensitive wchars
Duane Hebert < XXXX@XXXXX.COM >wrote:
Quote[...] 'wchar_t's size at compile-time and decide what to use for character types. On Windows, the main internal representation is UCS-2 (i.e. those Unicode chars that need only 16bit) because that's what Windows does. On OS X the same type (but with a 'wchar_t' of 32bit) carries UTF-32 (which I think is safe so far, as there aren't any UTF-32 chars needing more than 32bit), because it's (AFAIK) what OS X uses internally. Schobi -- XXXX@XXXXX.COM is never read I'm Schobi at suespammers dot org "Coming back to where you started is not the same as never leaving" Terry Pratchett |
| Duane Hebert
CBuilder Developer |
2005-08-03 02:22:35 AM
Re:case insensitive wchars
"Hendrik Schober" < XXXX@XXXXX.COM >wrote in message
QuoteWe do Unicode in cross-platform code using windows. We've been using std::string with UTF8 for the config stuff. The classes that are non-gui deal with them straight. The gui classes that allow user I/O with some of this data use QString which has to/from utf8 functions. So far it's been working well but the OP asked for cross platform standard way of doing things. Your answer may be more suitable to him. QuoteOn Windows, the main internal representation |
