New concepts
````````````

In 3.7 or earlier the system kept all languages inside one version and would
allow only one person to modify their data. This made it impossible for
multiple persons to edit each language separately and at the same time, instead
they would have to wait for one language to be updated and published.

Languages are now more self-contained and independent. This allows languages to
be created and edited separately by multiple users, it also simplifies things
like approval since you can consider one language only.

The major changes to content model are:

- A user only edits one version and language at a time.
- Objects can be created in any language on any site-access, the first language
  of the object is recorded.
- Objects can be translated to as many languages as necessary.
- Available languages can be controlled per site-access, allowing languages to
  be filtered.

Global translations
-------------------

A new table (*ezcontent_language*) is introduced which replaces the
old table (*ezcontent_translation*).
Each entry in the table gets a new *id* which is a representation of
the bitnumber as a value. ie. 2^bitnum. Id 1 (2^0) is reserved by
the system.
The language is stored with a locale code (e.g. nor-NO) and a name, in
addition it also has the field *disabled* which can be used to disable
a language on the site.

The maximum number of languages is 30.

The global table looks like the following::

  ezcontent_language
  +----+----------+--------+--------------------------+
  | id | disabled | locale | name                     |
  +----+----------+--------+--------------------------+
  |  2 |        0 | eng-GB | English (United Kingdom) |
  |  4 |        0 | rus-RU | Russian                  |
  |  8 |        0 | slk-SK | Slovak                   |
  | 16 |        0 | eng-US | English (American)       |
  | 32 |        0 | nor-NO | Norwegian (Bokmal)       |
  +----+----------+--------+--------------------------+


Initial language
----------------

Whenever an object is created for the first time it will store the language the
object was created in. This allows the system to find one language on the
object when none of the others can be used (e.g. filtered away). The value will
be stored in the field *initial_language_id* using the *language_id* from
the *ezcontent_language* table.

The new field looks like::

  ezcontentobject
  +-----+----------------------+
  | id  | initial_language_id  |
  +-----+----------------------+
  |   1 |                    2 | (eng-GB)
  |   2 |                    2 | (eng-GB)
  |   5 |                    2 | (eng-GB)
  |  10 |                    2 | (eng-GB)
  |  42 |                   16 | (eng-US)
  |  44 |                    4 | (rus-RU)
  +-----+----------------------+

Available languages
-------------------

To make it easier to figure out which languages are available on an object a
new attribute is added which returns the languages as an array of
locale codes.

The new attribute is called *available_languages* and will use the
*language_mask* field combined with the global language table to
figure out the correct codes.

The result of this attribute is shown below::

  Object  1: array( 'eng-GB' )
  Object  2: array( 'eng-GB', 'slk-SK' )
  Object  5: array( 'eng-GB', 'rus-RU' )
  Object 10: array( 'eng-GB' )
  Object 42: array( 'eng-US' )
  Object 44: array( 'rus-RU' )

Language priority and filtering
-------------------------------

With the new concepts is no longer directly known which languages to use per
object, instead this must be calculated based on the requested languages in
prioritized order and the available languages (in object).

The requested languages might be different per site-access so the solution must
be fast and scalable.

A language priority list is basically a list of languages to show and the first
language will get higher priority than the second and so on.

Filtering languages
~~~~~~~~~~~~~~~~~~~

To perform proper language filtering and prioritization a new bit-field
algorithm is introduced. All languages used in an eZ Publish instance
are identified by powers of 2 to be able to use bitmap *and* for selecting
rows containing information on objects or attributes in prioritized languages.
Id 1 (2^0) is reserved for marking objects which are always available.

To show how the bit-field algorithm works, let's have a look at the
ezcontentobject table. A new field is introduced to identify languages in which
last published version of an object exist. This field contains the sum of id
values of these languages. Because id values are powers of 2 it is possible to
identify the languages and/or to select objects published in any of prioritized
languages::

  ezcontentobject
  +-----+---------------+
  | id  | language_mask |
  +-----+---------------+
  |   1 |             3 | %00011
  |   2 |            10 | %01010
  |   5 |             6 | %00110
  |  10 |             2 | %00010
  |  42 |            16 | %10000
  |  44 |             4 | %00100
  +-----+---------------+

Note that language_mask 3 in the first row denotes that the language is English
(id 2) and the object is always available (id 1), even if the prioritized list
does not contain English (eng-GB).

For example, to filter objects which exist in Slovak (id 8) or Russian (id 4),
we might run the following SQL::

  SELECT id FROM ezcontentobject
  WHERE language_mask & 12 > 0

Whenever the available languages change for the object (i.e. new version
published) it the bit-field is updated.

Another bit-field is used in tables containing the language_code attribute. This
bit-field represents the same information as the language_code but uses the id
from ezcontent_language table to make easy to select correct rows with respect
to the list of prioritized languages.

Consider the following example::

  ezcontentobject_attribute
  +----+---------+------------------+---------------+-------------+
  | id | version | contentobject_id | language_code | language_id |
  +----+---------+------------------+---------------+-------------+
  | 10 |       1 |                1 |        eng-GB |           3 |
  | 31 |       1 |                2 |        eng-GB |           2 |
  | 32 |       1 |                2 |        slk-SK |           8 |
  | 46 |       1 |                5 |        eng-GB |           2 |
  | 47 |       1 |                5 |        rus-RU |           4 |
  +----+---------+------------------+---------------+-------------+   

To select content object attributes in prioritized languages, we might use a
single SQL containing a condition using bit-fields, bitmap *and* operation and
simple arithmetic operations (multiplication, division and sum).

Consider the following prioritized language list: slk-SK, eng-GB. To check if
the attribute is in slk-SK we can check if language_id & 8 is not zero etc.
If we perform the following operation::

  ( language_id & 8 ) / 2
  + language_id & 2
  + language_id & 1

we will get the bit-field value having 1 on 0th bit if the attribute is always
available (i.e. the object containing it have to be shown even if it is not in
any of prioritized languages), having 1 on 1st bit if the attribute is
available in eng-GB and having 1 on 2nd bit if the attribute is available in
slk-SK.

To find out if the attribute is in one of the prioritized languages and that
there is no translation of this attribute in more prioritized language, we
select those which hold the following condition::

  ( (     language_mask - language_id ) & 8 ) / 2 
      + ( language_mask - language_id ) & 2 
      + ( language_mask - language_id ) & 1
  < ( language_id & 8 ) / 2
    + language_id & 2
    + language_id & 1

where *language_mask* is the ezcontentobject attribute containing the bit-field of
available languages.

All languages
~~~~~~~~~~~~~

To solve the issues with cronjobs and admin interfaces which must always list
all available languages we introduce a special flag. When used the SQLs will
include all objects even if it does not have any of the other languages.

Translated attributes
---------------------

With the new database concepts it is possible for multiple users editing the
same object but in different languages (and versions). This means that storage
of attributes is changed since one version does not reflect all languages
anymore.

In short it means that for one version of an object there will only be
attribute data for one language when editing. When the version is published the
system will copy the attribute for all other languages from the last published
version to this one.

The table will then look like::

  ezcontentobject_attribute
  +----+------------------+---------+---------------+-------------+
  | id | contentobject_id | version | language_code | language_id |
  +----+------------------+---------+---------------+-------------+
  |  7 |                5 |       1 | eng-GB        |           2 |
  |  8 |                5 |       1 | eng-GB        |           2 |
  |  9 |                5 |       2 | rus-RU        |           4 |
  | 10 |                5 |       2 | rus-RU        |           4 |
  |  7 |                5 |       2 | eng-GB        |           2 |
  |  8 |                5 |       2 | eng-GB        |           2 |
  |  7 |                5 |       3 | eng-GB        |           2 |
  |  8 |                5 |       3 | eng-GB        |           2 |
  |  9 |                5 |       3 | rus-RU        |           4 |
  | 10 |                5 |       3 | rus-RU        |           4 |
  |  7 |                5 |       4 | slk-SK        |           8 |
  |  8 |                5 |       4 | slk-SK        |           8 |
  |  7 |                5 |       4 | eng-GB        |           2 |
  |  8 |                5 |       4 | eng-GB        |           2 |
  |  9 |                5 |       4 | rus-RU        |           4 |
  | 10 |                5 |       4 | rus-RU        |           4 |
  | 22 |               44 |       1 | eng-GB        |           2 |
  | 23 |               44 |       1 | eng-GB        |           2 |
  +----+------------------+---------+---------------+-------------+

Node assignment
---------------

As with Translated attributes there are now issues with the node assignments as
well. For instance if two users edits the same object in different languages at
the same time they will each get a copy of the published node assignment, if
they both modify it there will be conflicts when publishing.

The avoid this conflict the system will no longer allowed locations to be
added, removed or moved from the admin interface (except first version). Any
changes to locations will have to be done from the admin interface (locations
tab).

The system will track the operations in the the node-assignment table and merge
the result with the "live" tree structure to properly add, remove or move
nodes.

Note: It will still be possible for PHP scripts to add, remove and move
      locations.


Object relations
----------------

Currently all relations are now stored only per version and not per
language. This means that there will be possible conflicts when two languages
are edited at the same time.

When publishing a version/language it makes sure the relation list is merged
together with the previous published data. This means that removed relations
must be marked as removed and not just deleted from the database.

Ignoring translation
--------------------

Some objects will need to always be available no matter which site-access is
used. For instance user objects will need to be fetched even if they are not
translated to the current language.

The current solution already has this on class attributes but is extended
further.

Class
~~~~~

A new field called *always_available* (default 0) is added to *class* table
which defines if objects of this class are always available. If this is set to
1 any object created from it will always be available.  One of the major uses
for this is for classes which needs to be always available, for instance users
or user groups.

When creating objects the objects will copy the current setting in the
class, any changes to the setting in the class will not affect existing
objects. This allows this setting to be switched per object at a later time.

By default eZ Publish will come with this setting turned on for Folders, Users
and User Groups.

Image datatype
++++++++++++++

The ezimage datatype got some changes to the XML storage due to the new
multi-language features. Now it always stores the <original> tag with
attribute ID, version and language and it is only updated when a new image
is uploaded.

An example::

  <original attribute_id="1530"
            attribute_version="1"
            attribute_language="eng-GB" />

This was required since the system will now copy over attribute data when
translation is disabled by using pure SQL commands.

Upgrade
-------

The system will still work with old entries having empty data in <original> but
will fill in the information if it is edited and published. Also old
non-translatable attributes have the <original> tag set correctly and will still
work.


Objects
~~~~~~~

When an object is always available the first bit (bit 0) will be set to 1. This
means it will be fetched no matter which priority list is used. It also means
that the object can still be translated. For objects which has none of the
available languages it will pick the initial language.

Workflows
`````````

Each version of an object can now only contain language, this means that there
is no need to change the workflow system or calls to the 'publish'
process. Instead the workflow events *approval* and *multiplexer* will get
support for filtering on language, they will store a language mask in the
*data_int2* DB field to choose which language to match on, a value of 0 means
any language.

Permissions
```````````

Being able to restrict access based on language is very useful since a
translator might not be allowed to modify the original content but only
create a new translation. 

The following access functions have been given language support:

- content/create : specify if user is allowed to create new object in specified
  language. 
- content/edit : specify if user has got access to edit the specified
  language(s) and create new translation for specified language(s).

To translate you need *content/read* and *content/edit* for the language to
translate into, this means users can translate objects without having
*content/create* rights.

View-cache
``````````

With the new multi-language feature a view-cache is no longer made for one
specific language but for a prioritized list of languages, this means that the
directory storage must change. We replace the singular language code and
site-designs with the name of the site-access  e.g. if we have the languages
*nor-NO*, *eng-GB* and *ger-DE* in the site-access *no* we get::

  var/cache/no/full/

We also optimize the cleaning code to use the *glob* call with *GLOB_BRACE*
expansion, e.g::

  glob( "var/cache/{admin,mysite}/{full,line}/1/5/15-*.cache",
        GLOB_BRACE );

This change means that it now uses *AvailableSiteAccessList* and
*RelatedSiteAccessList* (*site.ini*) when cleaning up caches.

Trash
`````

Currently objects are restored from trash by going to content/edit and then
getting a new version. This will not work very well with the new system where
there is only one language per version.

A new view in the content module handles restoration of trash objects
(content/restore). It will allow the user to restore to old location or browse
for a new location.

Cronjobs
````````

Cronjobs are run with all languages enabled to ensure that they can reach any
object. The cronjob system enables the global flag for the language class which
makes all subsequent operations fetch any object in the system regardless of
the language.


Configurability
```````````````

Currently the system will use the site-access settings to determine which
languages to fetch. However sometimes it is crucial to able to override this
per session or function call. This means that most fetches should have an
ability to choose the priority list.

In addition PHP code can change the current prioritized languages used by all
operations by doing::

  // Only Norwegian and English are shown, no matter what INI settings are
  // configured to
  $languages = array( 'nor-NO', 'eng-GB' );
  eZContentLanguage::setPrioritizedLanguages( $languages );

Fetching node/object lists
--------------------------

When fetching list (template fetch function) of nodes or objects it is possible
to override the default language of the site-access. This can be used to narrow
down the languages or to fetch objects in languages normally not
accessible. The fetch function has a parameter called *language* which is an
array of languages to act as the priority list (instead of the one defined in
site-access).

The existing parameters *only_translated* and *language* to the fetch functions
*content/list*, *content/tree*, *content/list_count* and *content/tree_count*
are deprecated since they are not valid anymore. If they are used the system
will give a warning about it.

The *treemenu()* operator also has the *language* parameter.

API
```

Handling languages on the site is done trough the new class eZContentLanguage,
it has methods to fetch the current priority list, full language list or to
manipulate the available languages for the current request.

Site preferences
----------------

To control which languages are used on the site (or site-access) some new INI
settings are available, they are placed under the *RegionalSettings* group in
*site.ini*.
To control the available languages and their priority the setting
*SiteLanguageList* is used, it is an array with language codes (from locale)
which are used, the ones appearing at the top will get higher priority than the
next one.

e.g. to setup a site with two languages, English and German you would do::

  [RegionalSettings]
  SiteLanguageList[]
  SiteLanguageList[]=eng-GB
  SiteLanguageList[]=ger-DE

Changing the order of languages (e.g. German first) is simply rearranging the
setting like this::

  [RegionalSettings]
  SiteLanguageList[]
  SiteLanguageList[]=ger-DE
  SiteLanguageList[]=eng-GB

If this setting is not added the system will use the old *ContentObjectLocale*
setting which means only one language is shown.

Also since it useful to show all languages another setting is available which
is called *ShowUntranslatedObjects*. It is very often used for the
administration page but also be used for any site-access you choose.  If this
setting is enabled the system will not filter away languages but will still use
*SiteLanguageList* for priority.

To enable this setting you would do::

  [RegionalSettings]
  ShowUntranslatedObjects=enabled

eZContentObject::defaultLanguage()
----------------------------------

This static method now behaves a bit different than earlier. It will now return
the top-most language in the language list or the value of
*ContentObjectLocale* if it is set.  If you have been using this in your code
you should see if it still is the correct method to use, using the attribute
*initial_language_code* on the object might be of more use since it returns the
first language for the object.

User interface
``````````````

The user interface (admin) has been updated because of the enhanced
multi-language features. The main idea is to make it easier for the users to
manage translations.

List languages in draft list
----------------------------

Each language of an object now shows up as a separate entry in the draft
list. If a user edits two languages on the same objects it will show two
entries.

Creating and editing
--------------------

When creating a new object it is now possible to choose which language it
should be made in using a drop-down list. The available languages are filtered
based on which class is chosen. If JavaScript is not enabled no drop-down is
shown but instead the user gets an extra screen asking the user to choose the
language.

When editing an object the system will give the user a language choice if the
language is not set in the URL. The user can edit an existing language or
create a new one (if possible).

Editing multiple languages
--------------------------

To ease the translation process it will still be able to edit two or more
translations (UI wise) at the same time, however the process is changed from
earlier. Internally the system actually edits two versions of the same object.

Context menus
-------------

The JS popup menus must has an extra sub-menu, it contains the languages the
object can be edited in or the choice to create a new translation (Another).

Changes to existing functionality
`````````````````````````````````

While the multi-language features for 3.8 tried very hard to not change any
setting or interface which already present (only adding), there were some
points which needed changes, they are explained here:

content/action
--------------

Action AddAssignment/SelectAssignmentLocation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This action no longer creates a new version of the object for adding a location
but instead creates the node immediately.

Action RemoveAssignment
~~~~~~~~~~~~~~~~~~~~~~~

This action no longer accepts the POST variable *AssignmentIDSelection* which
contains list of node-assignment ids. Instead use the variable
*LocationIDSelection* and fill it with Node IDs.

The template *location.tpl* (admin) has also been changed to use this new
variable and fetching locations from the node table and not node-assignments.


Outtakes
````````

Due to technical constraints the following items were not able to make it into
3.8.


URL aliases
-----------

For instance if you have a French site where you only show french translations
of the URLs it is desirable to show the french URL alias for the  users. The
translated URL will point to the same internal URL but will use language
priority to choose what to show.

Instead of introducing a new table for this or changing the subtree table, the
existing URL alias table is extended. This table will now have a new field
called *language_code* which tells which language the url is made for. The
system will then create/update these entries for all languages of an object
when it is published.

The table will look like::

  ezurlalias
  +---------------------------+-------------+--------------------------------------+
  | destination_url           | language_id | path_id_string                       |
  +---------------------------+-------------+--------------------------------------+
  | content/view/full/2       |           2 |                                      |
  | content/view/full/13      |          32 | folder_1/pingvinen_har_en_megafon    |
  | content/view/full/13      |           2 | folder_1/the_penguin_has_a_megaphone |
  +---------------------------+-------------+--------------------------------------+

The PHP class eZContentObjectTreeNode will have this translated url-alias
available as a function attribute (*localized_path*), if it is null in the
object it will fetch it from the DB. To avoid having to perform lost of
repeated SQL calls for each node all fetches nodes should be associated with
some collection, then we can go over the nodes in the collection and collect
multiple node IDs and use that for the SQL. (The existing cache system might
also suffice).

The fetch calls should get the possibility to fetch the name and path
immediately, this means that if you know you will use the name and/or path it
can be fetched in one go (after the main SQL). A new parameter is added to the
fetch functions for this.

Search engine
-------------

Add *language_code* on word table, this will separate words per language and
solve the issue when one word exists in two or more languages but with
different meaning. Another issues it solves is when you search in specific
language and the word you are looking for does not exist in this language but
in other ones, then there is no need to include that in the search (it could
even tell the user that).

An example on how it could look::

  ezsearch_word
  +------+--------------+-------+-------------+
  | id   | object_count | word  | language_id |
  +------+--------------+-------+-------------+
  |    1 |            5 | to    |           2 |
  |    2 |            2 | to    |          32 |
  |    3 |            1 | kaker |          32 |
  |    4 |            1 | go    |           2 |
  +------+--------------+-------+-------------+

Search UI
---------

When searching it must be possible to override the default language of the
site-access. This can be used to narrow down the languages or to fetch objects
in languages normally not accessible.

The language must be able to be specified in two forms:

- Using a parameter to the search template fetch function.
- Using a GET/POST parameter from an HTML form.

The languages which can be chosen must be one of the languages defined in the
site-access.

Choosing translation per node
-----------------------------

The user should be able to choose which translation to use per placement. This
means that the priority list would change depending on which node is chosen,
e.g. the Norwegian article is only shown in the Norwegian sub-tree while the
English is shown in the English sub-tree.

This is a highly complicated technical issue which cannot be solved without
major changes to the tree structure.



..
   Local Variables:
   mode: rst
   fill-column: 79
   End:
   vim: et syn=rst tw=79
