Network Working Group B. Hoehrmann Internet-Draft September 15, 2010 Intended status: Informational Expires: March 19, 2011 The i;codepoint collation draft-hoehrmann-cp-collation-01 Abstract This memo describes the "i;codepoint" collation. Character strings are compared based on the Unicode scalar values of the characters. The collation supports equality, substring, and ordering operations. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on March 19, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Hoehrmann Expires March 19, 2011 [Page 1] Internet-Draft The i;codepoint collation September 2010 1. Introduction The i;codepoint collation operates on Unicode strings and treats any and all differences between two strings as significant. Ordering of different strings is determined by the Unicode scalar values of the characters. It produces usable results where further information is unavailable. In that it is suitable as default collation. The equality operation determines if two strings are identical. This makes the collation suitable for use with strings known to be in some canonical form. Similarily, applications that require strings to be in a canonical but otherwise arbitrary order may find this collation the most efficient as it requires no transformations. 2. Definition The i;codepoint collation is the same as the i;octet collation except that it operates on sequences of Unicode scalar values, not octets. Note that by definition the set of Unicode scalar values excludes the surrogate code points and as such they do not occur in valid input. 3. Security Considerations None beyond those in RFC 4790 [RFC4790]. 4. IANA Considerations This document defines the i;codepoint collation as per [RFC4790]. 5. Registration document i;codepoint Unicode identity equality order substring RFC XXXX IETF bjoern@hoehrmann.de 6. References [RFC4790] Newman, C., Duerst, M., and A. Gulbrandsen, "Internet Application Protocol Collation Registry", RFC 4790, March 2007. Hoehrmann Expires March 19, 2011 [Page 2] Internet-Draft The i;codepoint collation September 2010 Author's Address Bjoern Hoehrmann Mittelstrasse 50 39114 Magdeburg Germany EMail: mailto:bjoern@hoehrmann.de URI: http://bjoern.hoehrmann.de Note: Please write "Bjoern Hoehrmann" with o-umlaut (U+00F6) wherever possible, e.g., as "Björn Höhrmann" in HTML and XML. Hoehrmann Expires March 19, 2011 [Page 3]