Francais | English | Espanõl

Diacritic

From Wikipedia, the free encyclopedia

(Redirected from Diacritical mark)
Jump to: navigation, search
Diacritical marks

accent

acute accent ( ˊ )
double acute accent ( ˝ )
grave accent ( ˋ )

breve ( ˘ )
caron / háček ( ˇ )
cedilla ( ¸ )
circumflex ( ˆ )
diaeresis / umlaut ( ¨ )
dot ( · )

anunaasika ( ˙ )
anusvaara (  ̣ )

hook / dấu hỏi (  ̉ )
macron ( ˉ )
ogonek ( ˛ )
ring / kroužek ( ˚ )
rough breathing / spiritus asper (  ῾ )
smooth breathing / spiritus lenis (  ᾿ )

Marks sometimes used as diacritics

apostrophe ( )
bar ( | )
colon ( : )
comma ( , )
hyphen ( ˗ )
tilde ( ˜ )
titlo (  ҃ )

A diacritical mark or diacritic, in some cases also called an accent mark, is a small sign added to a letter to alter pronunciation or to distinguish between similar words. The term derives from Greek διακριτικός (diakritikos, distinguishing). Note that diacritic is a noun and diacritical is the corresponding adjective.

A diacritical mark can appear above or below a letter, or in some other position. Its main usage is to change the phonetic value of the letter to which it is added, but it may also be used to modify the pronunciation of a whole word or syllable, like the tone marks of tonal languages, to distinguish between homographs, to make abbreviations, such as the titlo in old Slavic texts, or to change the meaning of a letter, such as denoting numerals in numeral systems like early Greek numerals.

A letter which has been modified by a diacritic may be treated as a new, individual letter, or simply as a letter-diacritic combination, in orthography and collation. This varies from language to language, and in some cases from symbol to symbol within a single language.

Contents

[edit] Types of diacritic

Marks that are sometimes diacritics, but also have other uses, are:

  • ( | ) bar through the basic letter
  • ( , ) comma
  • ( ~ ) tilde
  • (  ҃ ) titlo
  • ( ' ) apostrophe
  • ( : ) colon, used to attach native affixes (such as case markers) to foreign words and abbreviations in a few languages (see below).
  • ( - ) hyphen - in English, hyphens can be used to break words between syllables, to resolve ambiguities in pronunciation:
    • repair (fix) compared to re-pair (pair again).
    • Kuringgai becomes Ku-ring-gai.

See also Category:Diacritics and Category:Uncommon Latin letters.

[edit] Alphabetization or collation

Main article: Collation

Different languages use different rules to put diacritic characters in alphabetical order. French treats letters with diacritical marks the same as the underlying letter for purposes of ordering and dictionaries.

The Scandinavian languages, by contrast, treat the characters with diacritics ä, ö and å as new and separate letters of the alphabet, and sort them after z. Usually ä is sorted as equal to æ (ash) and ö is sorted as equal to ø (o-slash). Also, aa, when used as an alternative spelling to å, is sorted as such. Other letters modified by diacritics are treated as variants of the underlying letter, with the exception that ü is frequently sorted as y.

Languages that treat accented letters as variants of the underlying letter usually alphabetize words with such symbols immediately after similar unmarked words. For instance, in German where two words differ only by an umlaut, the word without it is sorted first in German dictionaries (eg schon and then schön, or fallen and then fällen). However, when names are concerned (eg in phone books or in author catalogues in libraries), umlauts are often treated as combinations of the vowel with a suffixed e; Austrian phone books now treat umlauts as separate letters (immediately following the underlying letter). In Spanish, although ñ is considered a new letter different from n and placed between n and o, acute accents and diaereses are ignored in alphabetization.

For a comprehensive list of the collating orders in various languages, see Alphabets derived from the Latin.

[edit] Non-alphabetic scripts

Some non-alphabetic scripts also employ symbols that function essentially as diacritics.

  • Non-pure abjads (such as Hebrew and Arabic script) and abugidas use diacritics for denoting vowels. Hebrew and Arabic also indicate consonant doubling and change with diacritics; Hebrew and Devanagari use them for foreign sounds. Devanagari and related abugidas also use a diacritical mark called a virama to mark the absence of a vowel.
  • Emoticons are commonly created with diacritic symbols, especially Japanese emoticons on popular boards such as 2chan and the many other imageboards suffixed -chan.

[edit] Generation with computers

Modern computer technology was developed mostly in the English speaking countries, so data formats, keyboard layouts, etc. were developed with an English bias; a "simple" alphabet without diacritical marks. This has led to fears internationally that the marks and accents may become obsolete to facilitate the worldwide exchange of data. Efforts have been made to create domain names that extend further than the English alphabet: the internationalized domain names, example: "pokémon.com".

Depending on the keyboard layout, which differs amongst countries, it is more or less easy to enter letters with diacritics on computers and typewriters. Some have their own keys, some are created by first pressing the key with the diacritic mark followed by the letter to place it on. Such a key is sometimes referred to as a dead key, as it produces no output of its own, but modifies the output of the key pressed after it.

In modern Microsoft Windows operating systems, the keyboard layout US International allows one to type almost all diacritics directly: "+e gives ë, ~+o gives õ, etc. On Apple Macintosh computers, there are keyboard shortcuts for the most common diacritics; Option-e followed by a vowel places an acute accent, Option-u folowed by a vowel gives an umlat, option-c gives a cedilla, etc. Diacritics can be composed in most X Window System keyboard layouts.

On computers it is also a matter of available code pages, whether you can use certain diacritics. Unicode solves this problem by assigning every known character its own code; if this code is known most modern computer systems provide a method to input it. With Unicode it is also possible to combine diacritical marks with most characters.

[edit] Languages with letters containing diacritics

The following languages have letters which contain or are made up of diacritics.

  • Esperanto has the symbols ŭ, ĉ, ĝ, ĥ, ĵ and ŝ, which are included in the alphabet, and considered separate letters.
  • Estonian has a distinct letter õ which contains a tilde. Estonian "dotted vowels" ä, ö, ü are similar to German, but these are also distinct letters, not like German umlauted letters. All four have their own place in the alphabet, between w and x.
  • Faroese uses acute accents, digraphs, and other special letters. All are considered separate letters, and have their own place in the alphabet: á, ð, í, ó, ú, ý, æ and ø.
  • Finnish uses dotted vowels (ä and ö). As in Swedish and Estonian, these are regarded as individual letters, rather than vowel + umlaut combinations (as happens in German). It also uses the characters å, š and ž in foreign names and loanwords. Å, ä and ö collate after z in the Finnish alphabet.
  • Hawaiian has the ʻokina (ʻ), often rendered as ('), which is considered a letter of the alphabet.
  • Hungarian uses the acute and double acute accent (unique to Hungarian): á é í ó ú and ő ű. The acute accent indicates the long form of a vowel, while the double acute performs the same function for ö and ü. Both long and short forms of the vowels are listed separately in the Hungarian alphabet.
  • Icelandic uses acute accents, digraphs, and other special letters. All are considered separate letters, and have their own place in the alphabet: á, ð, é, í, ó, ú, ý, æ, ö and þ.
  • Latvian has the following letters: ā ē ī ū ŗ ļ ķ ņ ģ š ž č.
  • Lithuanian. In general usage, where letters appear with the caron (č, š and ž) they are considered as separate letters from c, s or z and collated separately; letters with the ogonek (ą, ę, į and ų), the macron (ū) and the superdot (ė) are considered as separate letters as well, but not given a unique collation order.
  • Livonian has the following letters: ā, ä, ǟ, , ē, ī, ļ, ņ, ō, ȯ, ȱ, õ, ȭ ŗ, š, ț, ū, ž.
  • Maltese uses a C, G, and Z with a dot over them (Ċ, Ġ, Ż), and also has an H with an extra horizontal bar. For uppercase H, the extra bar is written slightly above the usual bar. For lowercase H, the extra bar is written crossing the vertical, like a t, and not touching the lower part (Ħ, ħ). The above characters are considered separate letters. The letter 'c' without a dot has fallen out of use due to redundancy. 'Ċ' is pronounced like the English 'ch' and 'k' is used as a hard c as in 'cat'.
  • Polish has the following letters: ą ć ę ł, ń ó ś ź ż.
  • Romanian uses a breve on the letter a (ă) to indicate the sound schwa /ə/, as well as a circumflex over the letters a (â) and i (î) for the sound /ɨ/. Romanian also writes a comma below the letters s (ș) and t (ț) to represent the sounds /ʃ/ and /ʦ/, respectively.
  • Among the Scandinavian languages, Danish and Norwegian have long used ash (æ, actually a ligature) and o-slash (ø), but have more recently incorporated a-ring (å) after Swedish example. Historically the å has developed from a ligature by writing a small a on top of the letter a; if an å character is unavailable, some Scandinavian languages allow the substitution of a doubled a. The Scandinavian languages collate these letters after z, but have different collation standards. Danish and Norwegian both follow the order æ, ø, å.
  • Spanish: the character ñ is considered a letter, and collated between n and o.
  • Swedish uses characters identical to a-diaeresis (ä) and o-diaeresis (ö) in the place of ash and o-slash in addition to the a-circle (å). Historically the diaresis for the Swedish letters ä and ö, like the German umlaut, has developed from a small gothic e written on top of the letters. These letters are collated after z, in the order å, ä, ö.
  • Turkish uses a G with a breve (Ğ), two letters with a diaeresis (Ö and Ü, representing two rounded front vowels), two letters with a cedilla (Ç and Ş, representing the affricates /tʃ/ and /ʃ/), and also possesses a dotted capital İ (and a dotless lowercase ı representing a high unrounded back vowel). In Turkish each of these are separate letters, rather than versions of other letters, where dotted capital İ and lower case i are the same letter, as are dotless capital I and lowercase ı. Typographically, Ç and Ş are often rendered with a subdot, as in ; when a hook is used, it tends to have more a comma shape than the usual cedilla.

[edit] Languages with diacritics that do not produce new letters

The following is a list of languages with letter-diacritic combinations that are not considered independent letters.

  • Catalan has the following composite characters: à ç é è í ï ó ò ú ü. The acute and the grave accent indicate stress and vowel height, the cedilla marks the result of a historical palatalization, and the diaeresis mark indicates either a hiatus, or that the letter u is pronounced when the graphemes gü, qü are followed by e or i.
  • Czech has the following composite characters: á, č, ď, é, ě, í, ň, ó, ř, š, ť, ú, ů, ý, ž.
  • Dutch uses the diaeresis. For example in ruïne it means that the u and the i are separately pronounced in their usual way, and not in the way that the combination ui is normally pronounced. Thus it works as a separation sign and not as an indication for an alternative version of the i. Diacritics can be used for emphasis (érg koud for very cold) or for disambiguation between a number of words that are spelled the same when context doesn't indicate the correct meaning (één appel = one apple, een appel = an apple; vóórkomen = to occur, voorkómen = to prevent). Grave and acute accents are used on a very small number of words, mostly loanwords.
  • English is one of the few European languages that does not use diacritical marks, except for some borrowings taken unchanged mainly from French, in which case the diacritic is often omitted. The most likely words to keep the diacritic are apparently those containing é (such as café, résumé (especially to distinguish it from the verb "resume"), and recipé) and the word naïve (See List of English words with diacritics). English used to use the diaeresis much like Dutch does still (as in words such as "coöperate"), but this has been falling out of use (The New Yorker's house style being one of the few publications to retain this feature). The grave accent was also once used, chiefly in poetry and songs, to modify the pronunciation of words ending in -ed; -èd indicates a separate syllable.
  • Estonian. Carons in š or ž may appear only in foreign proper names and loanwords, but may be also substituted with sh or zh in some texts. Apostrophes can be used in the declension of some foreign names to separate the stem from any declension endings; e.g., Monet' or Monet'sse for the genitive case and illative case, respectively, for (the famous painter) "Monet".
  • Finnish uses a colon to decline loanwords and abbreviations; e.g., USA:han for the illative case of "USA". But for loanwords ending orthographically in a consonant but phonetically in a vowel, an apostrophe is used instead: e.g. show'n for the genitive case of the English loan "show".
  • French uses grave, acute, circumflex, cedilla and diæresis.
  • German and Swiss German have the Umlaut (¨). This can be used over a, o, or u to indicate vowel modification. For instance: Ofen /'o:fən/; Öfen /'ø:fən/, which in this case makes the difference between singular and plural (“oven”/“ovens”). The sign originated in a superscript e; a handwritten Sütterlin e resembles two parallel vertical lines, like an umlaut.
  • Hebrew has many various diacritic marks known as niqqud that are used above and below script to represent vowels. These must be distinguished from cantillation, which are keys to pronunciation and syntax.
  • Irish uses acute accent to indicate that the vowel is long. It is known as síneadh fada (long sign) or simply fada (long) in Irish.
  • Maltese sometimes uses diacritics on some vowels to indicate stress or long vowels, but this is restricted to pronunciation assistance in dictionaries.
  • Portuguese has the following composite characters: à á â ã ç é ê í ó ô õ ú ü. The acute and the circumflex accent indicate stress and vowel height, the grave accent indicates crasis, the cedilla marks the result of a historical palatalization, and the diaeresis mark indicates that the letter u is pronounced when the graphemes gü, qü are followed by e or i.
  • Acute accents are also used in Slavic language dictionaries and textbooks to indicate lexical stress, placed over the vowel of the stressed syllable. This can also serve to disambiguate meaning (e.g., in Russian писа́ть (pisát) means "to write", but пи́сать (písat) means "to piss").
  • Slovak has the acute, the caron, the circumflex (only above o) and the diaeresis (only above a).
  • Spanish uses the acute accent and the diaeresis. The acute is used on a vowel in a stressed syllable in words with irregular stress patterns. It can also be used to "break up" a diphthong as in tío (pronounced /'tio/, rather than /tjo/ as it would be without the accent). Moreover, the acute can be used to distinguish words that otherwise are spelt alike, such as si ("if") and ("yes"), and also to distinguish interrogative and exclamative pronouns from homophones with a different grammatical function, such as donde/¿dónde? ("where"/"where?") or como/¿cómo? ("as"/"how?") The diaeresis is used only over u (ü) so that it be pronounced /w/ in the combinations gue and gui (where u is normally silent), for example ambigüedad. In poetry, the diaeresis may be used on i and u as a way to force a hiatus.
  • Swedish sometimes uses an optional acute accent to show non-standard stress, like in idé, kafé or resumé.
  • Tagalog uses a hyphen after a consonant to indicate a syllable break (nag-alis /nag·a·lís/ as opposed to nagalis /na·ga·lís/). A hyphen is not necessary between two vowels, vowels being distinctly pronounced in Tagalog (tauhan /ta·ú·han/, buo /bu·ô/).
  • Tamil does not have any diacritics in itself, but uses the Western numerals 2, 3 and 4 as diacritics to represent aspirated, voiced, and voiced-aspirated consonants when the Tamil script is used to write to long passages in Sanskrit.
  • Vietnamese uses the acute accent (dấu sắc), the grave accent (dấu huyền), the tilde (dấu ngã), the dot below (dấu nặng) and the hook (dấu hỏi) on vowels as tone indicators.
  • Welsh uses the circumflex, diaeresis, acute and grave accents on its seven vowels a, e, i, o, u, w, y. The most common is the circumflex (which it calls to bach, meaning "little roof") to denote a long vowel, usually to disambiguate it from a similar word with a short vowel. The rarer grave accent has the opposite effect, shortening vowel sounds which would usually be pronounced long. The acute accent and diaeresis are also occasionally used, to denote stress and vowel separation respectively. The w-circumflex and the y-circumflex are among the most common accented characters in Welsh, but unusual in languages generally, and were until recently very hard to obtain in word-processed and HTML documents.

[edit] See also

[edit] External links

The OSI basic Latin alphabet
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz
historypalaeographyderivationsdiacriticspunctuationnumeralsUnicodelist of letters
als:Diakritisches Zeichen

zh-min-nan:Phiat-im hû-hō br:Sinoù diakritek ca:Signe diacrític cs:Diakritické znaménko da:Accenttegn de:Diakritisches Zeichen es:Signo diacrítico eo:Diakrita signo fr:Diacritique gl:Diacrítico ko:발음 구별 기호 it:Segno diacritico he:סימן דיאקריטי ht:Dyakritik lt:Diakritiniai ženklai nl:Diakritisch teken ja:ダイアクリティカルマーク no:Diakritisk tegn nn:Aksentteikn pl:Znaki diakrytyczne pt:Diacrítico ro:Semn diacritic ru:Диакритический знак fi:Diakriittiset merkit sv:Diakritiskt tecken wa:Diyacritike zh:变音符号

Personal tools