ORACLE UCM重要的组件FileStoreProvider

File Store Provider



Date:     

April 27, 2007

Product and Version:

Content Server 10gR3

 

Prerequisites and Recommendations:

The FileStoreProvider component extends the standard file store provider that ships with the 10gR3 core. The standard default provider is the mechanism used to access all the files managed by the content server and in the out-of-the-box version, the files are stored by the usual means.

FileStoreProvider 组件继承 UCM10gR3 中发布的标准文件存储提供程序( file store provider )。默认的文件存储提供程序是一种机制,该机制被用来访问存储在内容服务器管理的文件,这些文件是以通常的方式进行存储的。

 

The FileStoreProvider is recommended for very large systems for placing files into storage devices. Once the provider has been installed and configured, removing it will cause the system to lose knowledge about the location of files. Similarly, reconfiguring a file store provider in an active system may cause the system to report files as missing.

FileStoreProvider 组件通常推荐使用 在很大的内容服务器上以把文件存储到存储设备上。但是一旦该组件被安装并且已经配置使用,再卸载该组件将导致内容服务器丢失已经存在的内容文件的地址。类 似的,安装配置后,已经开始使用该组件,如果重新配置,系统也将认为原先的文件已经丢失。

Background Information:

In the Content Server, data management requires the management of files and their associated metadata. File management consists of the ability for users to store and access their checked in files as well as the files that may have been generated as a result. Initially a single storage is all that is needed, but as the volume of content items and files increases, it is necessary to disperse the storage. This can be done by adding more storage devices and/or by creating a sparser directory structure. The former option allows for greater storage space, while the later increases the file access performance.

在内容服务器中,数据管理包括文件管理和跟文件相关的 元数据的管理。文件管理中有包括用户对自己捡入到内容服务器的文件以及系统处理过后生成的文件。刚开始的时候可能只有一个存储区域,但是随着内容项目类别 和文件的增加,有可能要增加分区。通过增加存储设备与(或者)对当前存储进行简单目录划分,前者会增加更多的存储空间,而后者呢,则会提高系统的访问性 能。

 

The second half of the data management is storing the metadata, which in the case of the Content Server is done in a relational database involving primarily three database tables. The metadata is used to allow users to catalogue the files and to provide a means for creating file descriptors for retrieval. For end users, the retrieval is done by the Content Server and how and where the file is stored may be completely hidden. For component and feature writers, who may need to generate or manipulate files, the metadata provides a means of completely accessing the desired files.

另外对文件相关的元数据的存储,内容服务器把它们存储在关系型数据库中,其中主要实现在三张表中。用户可以用元数据来对文件进行编目以及生成对文件的一种描述。对于客户端的用户来说,取得文件是通过文件服务器来完成的,如何取得以及从哪里取来的文件用户完全不知道。对于组件以及其他的操作手段,如果要生成内件或者是操作内件,元数据可以提供对文档的访问方式。

 

The location of files in the Content Server has remained static over the years. By using the revision information specified by the doc type, security group and account, the files and their associated renditions are placed into particular directories. For example, the vault (or native) files, are the files that the user has checked in. Its location, traditionally, has been defined to be

内容服务器存储文件的位置一直以来是固定的。根据内容的类型( DocType )、安全组( sccurity group )、账户( account )等文件以及它的各种版本的文件被存储在一定的目录结构下。例如, vault 文件(或者本地文件)是用户捡入的文件原件,一般情况下这类文件的存储位置被定义为

 

~/vault/<dDocType>/<account>/<dID>.<dExtension>

 

where dDocType is the content type provided by the user, dID is the system generated id that uniquely identifies this revision and dExtension is the extension of the file checked. In the standard model, the system uses the dDocType metadata field to disperse the files across the vault directory. This is rather straightforward calculation and consequently, is quite transparent to component and feature writers giving them knowledge about where files are located and how to manipulate them. However, it has also had the effect of limiting storage management. Without careful management of the location metadata mentioned above, directories can become saturated causing the system to slow down. Also, under the standard configuration, it is difficult to use extra storage devices or to opt out of the creation of, for example, the web renditions.

dDocType 指的是内容的类型,在捡入时由用户指定。 dID 是系统生成用来标识本内容的唯一标识符。 dExtention 是捡入文件的扩展名。默认的情况下,系统将在把捡入的内容在 vault 目录下再以该内容的 dDocType 值进行目录划分。这样看起来非常的简单易懂,对于其他组件和其他操作模块来说更容易知道文件的存储位置。然而,这样也给存储的管理造成了负面的影响。如果对目录中的元数据管理不善的话,很可能某一个目录下的文件将达到饱和而是系统性能受到影响。另外,使用这种存储方式,如果要添加新的存储设备或者 to opt out of the creation of, for example, the web renditions.

 

As a consequence of dealing with large systems, the following features became highly desirable and have been addressed by the more advanced features of the FileStoreProvider component:

-          The ability to relocate files

-          The ability to partition files across multiple storage devices

-          The ability to have the web-viewable be optional

-          The ability to manage and control directory saturation

-          The ability to store files in the database

-          Provide an API to extend and enhance to different storage paradigms

为了应付比较大的系统就需要下面的几个特性来支持,而这些也已经集中的在 FileStoreProvider 已经实现。

-            能够重新配置文档的存储路径

-            能够把文件分部分存储到多个存储设备上

-            使得 web-layout 文件的生成具有可选择性(选择是否生成浏览器可浏览的文件格式)

-            能够管理目录的饱和程度(某个目录下如果文件个数达到一定的数目则自动切换到其他目录进行存储)

-            文件也可以选择存储到数据库中

-            Provide an API to extend and enhance to different storage paradigms

Installation:

Warning: Stellent recommends you deploy and confirm this component in your development environment before using this in a production environment.

警告: Stellent 强烈建议您先在开发环境下安装部署该组件,待确认使用方案并测试通过,然后再实施到正式的生产环境下。

 

1)     Download the file FileStoreProvider.zip.

 

2)     The FileStoreProvider.zip is a Stellent component. Use the Component Wizard or the Admin Server Component Manager to install and enable the component.

 

3)     Restart the Content Server.

 

4)     Configure the file store provider.

 

(一)  下载 FileStoreProvider.zip 文件

(二)  FileStoreProvider.zip Stellent 的一个组件。是可用组件安装向导或者组件管理器安装 FileStoreProvider 组件并启用它

(三)  重新启动系统

(四)  配置 file store provider 组件。

Renditions and Storage:

In the most cases, a content item consists of metadata, a primary file and potentially an alternate file. The primary file is stored in the vault and any web-viewable files are stored in the web. If there is no refinery on the system, the web file is a copy of the primary file or if it exists, the alternate. If there is a conversion engine or refinery available, the primary file may be sent to a conversion engine and create a web-viewable renditions as well as additional renditions, e.g. thumbnails. Similarly, other components may create auxiliary renditions of the file in the vault and/or the web.

通常的情况下,一个内容项都包含元数据,一个主文件( vault 文件)以及一个可选择的 alternate 文件。主文件一般存储在 vault 目录下,而网络可视化格式则存储在 weblayout 目录下。如果系统中没有定义格式转化的话, weblayout 文件会保存 vault 文件的一个拷贝。如果安装了格式转换引擎或者定义了转换器的话,系统会把 vault 发送到格式转换引擎,创建 vault 的网络可视化文件以及其他附加的文件,比如缩略图等。类似地,其他的组件也可能创建在 vault weblayout 目录下创建其他的辅助的文件。

 

From the web browser, a file can be accessed dynamically via a Content Server service request or statically. The static weburl is only used when there is a guarantee that the file is on the file system. Otherwise, the dynamic delivery of the file is used. On the UI, the file store provider only allows the configuration of the static delivery. However, the administrator may decide that the ‘static’ delivery be done as a Content Server service request and in essence be dynamic. By definition, we refer to the dynamic access to the file as weburl and the static access as weburl file.

在浏览器上,文件可以使用动态的一个服务提供给用户或者直接通过静态地址的方式提供。只有系统对文件提供保护的时候才会提供静态地址,否则一般会使用动态服务的方式。在 UI 上,文件存储提供程序只允许配置静态传送。然而管理者可以限定静态的传送后台实际上是以动态服务的形式实现。根据定义我们可以通过是以动态方式访问文件或者以静态地址的方式访问文件。

 

This brings us to the terms we will be using for the rest of the discussion. When we say rendition, we mean the primary file, web viewable, alternate file or any of the additional renditions. When we say storage class, we are referring to the vault, web or weburl. So, a rendition is a version of the file, while the storage class is a grouping of renditions by either where it is stored or how it is accessed.

下面介绍一些我们在以后的讨论过程中经常要遇到的一些名词术语。我们在说到“ rendition ”的时候是指主文件( primary file )、网络可浏览文件( weblayout 文件)以及其他一些可能用到的文件。当我们说到“ storage class ”就是指的 vault web 或者是 weburl 。所以当我们说起“ rendition ”是指一个文件的各种版本,而“ storage class ”则是指一组文件,这些组是按照存储的位置或者是访问的方式来划分。

 

The rendition and storage class are tied together via the storage rule. The storage rule is how the system determines how a content item has its renditions stored in the various devices, e.g. file system or database. Note the content item is assigned a storage rule or rather given a content item; the storage rule can easily be deduced.

renditon ”和“ storage class ”通过“ storage rule ”结合到一起。那么什么是 storage rule 呢? storage rule 定义了一个文件的各种版本如何被存储在各种存储设备上,比如数据库或者文件系统等。注意,每一个内容项,都会被赋予一个“ storage rule ”或者 be given a content item ;每一个 storage rule 可以简单地被追溯到。

 

One of way of understanding the relationship between rendition, storage class and storage rule is to walk through a few simple examples. For all examples below, a content item is added to a system consisting of only a primary file.

通过下面的一些例子可以更好地理解 renditon storage rule storage class 之间的关系。在下面的例子中一个只包含主文件( primaty )的内容被捡入到内容服务器。

 

  1. A storage rule is defined to be of type FileStorage.

In this scenario, the system makes a copy of the primary file into the web directory.

 

  1. A storage rule is defined to be of type FileStorage and as a webless storage.

This is similar to (1) above, except the web file does not exist. When there is a request for the web-viewable file, the system returns the vault file, i.e. primary file.

 

  1. A storage rule is defined to be of type JdbcStorage.

Both the vault and web files are stored in the database. However, one should note that the jdbc storage is built on of the file storage and when necessary, a file can be forced onto the file system. This generally occurs during indexing or conversion.

 

  1. A storage rule is defined to be of type JdbcStorage with all renditions on the web stored on the file system.

This is similar to (3) above, except that the web-viewable renditions are on the file system.

 

(一)  Storage rule 是默认的 FileStorage 类型。在该配置下,系统将在 weblayout 目录下存储 vault 的拷贝。

(二)   Storage rule FileStorage 类型并且被定义为没有网络版本。此时的情况跟第一种差不多,但是没有网络版本的文件,即 weblayout 目录下不会存储 vault 的拷贝了。当客户端访问该文件时,系统会发送该内容的 primary file 作为响应。

(三)   文件存储类型为 JdbcStorage Vault 文件和 weblayout 文件都将被存储在数据库中。尽管如此,这种方式也要以文件存储为基础的,必要时,文件会被放在文件目录下进行处理。主要在索引文件以及对文件进行格式转换的时候会这样做。

(四)  Storage rule 定义为 JdbcStorage ,此时也可以把文件的 web-viewable 文件存储在文件系统上。该形式跟第三种相似,只是把 web-layout 文件都存储在文件系统中。

Configuration:

On a successful install, the providers’ page now gives the administrator the ability to update the default file store provider to be a file system provider.  Edit the default file store provider and click on update. From here, you can change the web, vault and weburl path expressions.

在正确安装 FIleStoreProvider 之后,会有一些及面提供给我们做管理使用,可以对原来默认的文件存储提供程序进行更新。点击“ default file provider ”信息中的更新按钮。在这里你可以重新配置 web vault 以及 weburl 等的地址表达式。

 

Before the file system provider is fully functional, partitions need to be configured. The partitions are used to define the root path of the rendition’s location.

在比较充分地使用文件系统提供程序之前,要先配置一下 partition Partition 定义了 rendition 存储位置的一些根路径。

 

Also, on installation, the component adds the three metadata fields, xPartitionId, xWebFlag and xStorageRule. These metadata fields are used as follows:

另外,在安装完该组件之后会在元数据项中添加项元数据 xPartitionId xWebFlag xStorageRule 。这三个元数据的使用如下:

 

xPartitionId – This metadata field is used in conjunction with the PartitionList table to determine the root location of the content item files. It is recommended that this field be hidden on the UI, since the partition selection algorithm provides a value.

xPartitionId – 该元数据字段配合 PartitionList 表使用来决定内容的文件存储在什么根目录下。建议该元数据字段不显示在用户的界面上,因为 partition selection algoritym 会计算提供这个根路径。

 

xWebFlag – This metadata field is used to determine whether a content item has a web-viewable file. Consequently, if the system has content items that have only vault files, then removing this metadata field will cause the system to expect the presence of a web-viewable and may cause harm to the system. The metadata field may be specified by the configuration value WebFlagColumn .

xWebFlag 该元数据字段用来决定内容是否有一个 weblayout 版本的文件存在。因此,如果原先是被定义为不生成 weblayout 文件,但是最后又删除掉该字段,这样会使得系统认为内容是存在一个 weblayout 文件的,这样会对系统造成一定的伤害。该字段的取值可以被 WebFlagColumn 配置项来决定(在配置 storage rule 的时候也有设置该字段的选项)。

 

xStorageRule – This metadata field is used to track the rule that was used to determine how the file is to be stored. The metadata field may be specified by the configuration value StorageRuleField .

xStorageRule 通过该元数据字段可以知道内容使用了那一个 storage rule 。该值也可以通过配置 StorageRuleField 环境变量来赋值。

 

The above metadata fields are added by the component on startup. If the metadata fields are deleted and should remain absent from the system, then use the configuration flag FsAddExtraMetaFields to stop the adding of these fields.

上面的字段组件被安装后在系统启动的时候被加入的。如果你不想在系统中保留着几个字段的话要使用 FsAddExtraMetaFields 变量来配置是否自动生成这些字段。

 

Also, on installation the component adds the database tables FileStorage and FileCache . These tables are used exclusively by the JdbcStorage file store provider.  The FileStorage table contains the contents of the files and it uses the dID of the content item and rendition to uniquely identify what renditions belong to which content item. The FileCache table is used to remember which files have been downloaded to the system’s file system. These are files when the system for one reason or another required a file on the file system. These files are for the most part temporary and the system deletes them as part of a scheduled event.

另外,在安装的时候,组件也会在数据库表中添加 FileStorage FileCache 两张表。这两张表是在 file store provider 使用 JdbcStorage 的时候使用。 FileStorage 表会存储内容的文件,并使用 dId rendition 来唯一确认文件是属于哪一个内容的哪一个版本。 FileCache 表用来记录哪些文件被下载到文件系统中了。这些文件因为某些穷狂需要下载到文件系统中,这些文件大都是暂时性的并根据一定的时间表会被删除掉。

 

Note that the system only supports one primary file store provider and by default it has been named the ‘DefaultFileStore’ provider.

 

Configuration Resource tables:

There are four main tables used to configure and handle variations in file path locations. The PartitionList table is initially empty and has a UI allowing a user to add, edit or delete rows. The PathMetaData and PathConstruction are used for path locations and the provided defaults cover most scenarios. Finally, the FileSystemFileStoreAlgorithmFilters requires a component along with java code to enhance.

该组件总共添加了四张表(不是数据库中的表,而是使用 html 的形式存储的表)来配置和处理各种路径。 PartitionList 表初始化的时候为空,系统会提供用户接口让用户添、编辑和删除该表中的内容。 PathMetaData PathConstruction 用来生成存储位置并提供了一些默认的值。 FileSystemFileStoreAlgorithmFilters 表需要添加 java 代码的组件来赠强功能。

 

Note the PathMetaData, PathConstruction and FileSystemFileStoreAlgorithmFilters tables are defined in the providers.hda and are provider specific, while the PartitionList table is defined in the ~/data/filestore/config/fsconfig.hda file and has a more global use.

注意 PathMetaData 表、 PathConstruction FileSystemFileStoreAlgorithmFilters 表是在 providers.hda 中定义的并且有 provider 来指定,而 PartitionList 则被定义在 ~/data/filestore/config/fsconfig.had 目录下,并且可以做全局的使用。

 

FileSystemFileStoreAlgorithmFilters:

This table is used to map an algorithm name to an implementation of the FilterImplementor interface. The algorithm can be referenced in the PathMetaData table and is used to calculate the desired path field. The class implementing the algorithm must return the required metadata fields it uses for calculation, when the file parameters object is null. Via the ExecutionContext, the doFilter method is passed in information about the field, content item, and file store provider that initiated the call. In particular, for the file system provider, the algorithm will be passed the following information via the ExecutionContext. Bear in mind that other file store providers may choose to pass in more or possibly different information.

该表用来把一个算法的名字映射到一个 FilterImplementor 的接口实现上。算法可以被引用到 PathMetaData 表中用来计算想要得到的路径字段。

 

Properties fieldProperties = (Properties)     context.getCachedObject("FieldProperties");

Parameters data = (Parameters)

    context.getCachedObject("FileParameters");

Map localData = (Map) context.getCachedObject("LocalProperties");

String algorithm = (String) context.getCachedObject("AlgorithmName");

 

PathMetaData :

This table is used to determine what metadata is used to determine the location of a file. The metadata may come directly from the content item’s metadata or be calculated via an algorithm.

该表用来提供哪些用来决定路径的各种元数据。这些元数据可以直接来源于内容的元数据项也可以是通过某种算法计算出来的值。

 

The columns are defined and used as follows:

这些列被定义和使用如下说明:

 

  • FieldName - name of the field as it appears in the path expression
  • FieldName – 出现在路径中的该字段的名字。

 

  • GenerationAlgorithm - if defined, specifies the algorithm used to resolve or compute the value for the field.
  • GenerationAlgorithm – 该字段可有可无,如果定义将被用来解决和计算该字段的值。

 

  • RequiredForStorage - defines for which storage class this metadata is required. Possible values are #all, web, vault. The field is optional for all renditions not specified. Consequently, if this column is empty, then the metadata field is optional for all renditions or storage classes. Note that if an algorithm has been specified, this value is empty. The algorithm uses the value specified in the ArgumentFields column to dictate which fields are required.
  • RequiredForStorage – 定义哪一个 storage class 中的路径该字段是必须的。可以娶到的值有 #all, web, vault 。如果不指定的话(即该字段为空)每一个 storage class 中的路径该字段可有可有。如果在该 PathMetaData 中指定了一个算法,该字段为空。算法会使用 ArgumentFields 字段来指示那些字段是必须的。

 

  • OverrideClientValue - By default is false. When set to true, the file store provider will override the value even if one is provided by the user. This value is only used if a GenerationAlgorithm is specified.
  • OverrideClientValue – 默认该值去 false 。如果设置成 true 的话, file store provider 会覆盖该

 

  • Arguments - optional arguments passed into the filter.
  • Arguments – 传向 filter 的可选的一些参数。

 

  • ArgumentFields – comma-separated list of fields required by the arguments and consequently required by the algorithm.
 
PartitionList :

This table is used to describe the partitions that are available for the partitionSelection algorithm. The columns of the table are used as follows:

该表用来描述 partitions 。这些 partition partitionSelection 算法用来决定要使用那个根路径存储文件。该表的字段描述如下:

 

  • PartitionName – specifies the name of the partition. This name is referenced in the path expression.

 

  • PartitionRoot – argument passed into the partitionSelection algorithm.

 

  • IsActive – determines if the partition is currently active and accepts new files.

 

  • CapacityCheckInterval – specifies the interval in seconds used in determining the available disk space. This may not work on all platforms.

 

  • SlackBytes – determines if there is sufficient space. If the available space is lower than the slack bytes, the partition is no longer used for contribution.

 

  • DuplicationMethods – available methods are link and copy. Note not all methods are available on all platforms and the ‘copy’ method is recommended by default.

 

 
PathConstruction:

This table provides a mapping of the file to a path. The path is made up of components, where a component may be calculated via an algorithm, IdocScript variable, environment variable or a metadata lookup.

该表提供路径和文件的映射。路径是有很多部分组成,这些部分可以是通过某个算法计算出来的值,也可以是环境变量或者内容元数据项。( PathConstruction 表定义的是 StorageRule 的一部分,最后一个字段将指示该 PathConstruction 属于哪一个 StorageRule

 

The columns of the table are defined as follows:

 

  • FileStore - specifies the storage that is being calculated . Possible values are web, vault, weburl, weburl.file.

 

  • PathExpression – defines the path. This path is parsed into components, which are resolved via the PathMetadata field definitions as described below.

 

  • AutoCreateLimit - specifies the depth of the directories that may be created.

 

  • IsWritable - specifies if the storage location is writable.

 

  • StorageRule – specifies the rule this path construction belongs to.

 

The most interesting and important column of the PathConstruction table is the PathExpression column. As mentioned, it defines the path or location of the file and consists of components. A path is broken into its constituent pieces or components by slashes. Each component can be made of a static string or a sequence of dynamic parts. A dynamic part is encapsulated by ‘$’. If the part is dynamic then it can have the following interpretations:

PathConstruction 表中最重要的字段是 PathExpression 字段。该字段定义各种文件所存储的位置并且是由很多部分组成的。使用斜线把路径的各个部分分开。每一部分可以是一个静态的字符串或者是一个动态计算的结果。两边用两个 $ 括起来。动态计算的值可以为如下的几种情况

 

  • It may be a field defined in the PathMetaData table. If it is defined in the PathMetaData table, it may be mapped to an algorithm, e.g. $dDocType$

可以是 PathMetaData 表中定义的一个字段。如果是的话可能映射到某一个算法,如 $dDocType$

 

  • If it has the prefix  #env., it is an environment variable, e.g. $#env.VaultDir$

如果有一个 #env 前缀的话,则说明是一个系统的环境变量。比如 $#env.VaultDir$

 

 

  • It may be an IdocScript variable, e.g. $HttpWebRoot$

也可以是一个 IdocScript 的变量,比如 $HttpWebRoot$

 

 

 

For example, the standard vault location is defined as

例如,标准默认的 vault 路径定义如下

 

$PartitionRoot$vault/$dDocType$/$dDocAccount$/$dID$$ExtensionSeparator$$dExtension$

 

When parsed this turns into 5 components, which will be interpreted according the rules specified in the PathMetaData table as follows:

解析的时候该表达式将被分成五部分根据 PathMetaData 定义的规则进行解释

 

1.     $PartitionRoot$ – this is mapped to the partitionSelection algorithm and uses the xPartitionId as a lookup into the ParitionList table to determine the root.

$PartitionRoot$ 该表达式将映射到 partitionSelection 算法,使用 xPartitionId 字段作为 key 值查询 PartitionList 表中定义的根值。

 

2.     /vault/ - a string, i.e. no calculation or substitution

/vault/ - 嘿嘿,就是一个字符串,不用什么计算啦。

 

3.     $dDocType$ – by the PathMetaData table this is a look up in the file parameters

$dDocType$ - 根据 PathMetaData 表的定义,该表达式即将查找文件的元数据字段。

 

4.     $dDocAccount$ - this is mapped to a documentAccount algorithm which takes dDocAccount and parses it into the standard content server account presentation with all the appropriate delimiters.

$dDocAccount$ - 该字段是被映射到 documentAccount 算法的。(下面的一点就会意翻译啦)该算法将取得 dDocAccount 的值,然后对应每一个值将建立一个路径,并根据内容的元数据把文件放到合适的目录下。

 

5.     $dID$$ExtensionSeparator$$dExtension$ – this component has three parts

a)     $dID$ - similar to dDocType , this is defined in the file parameters and is a required field.

$dID$ - dDocType 相似,该字段是来源于文件的元数据参数该元数据在系统中是必填的字段。

b)    $ExtensionSeparator$ – determined by an algorithm.

$ExtensionSeparator$ 使用某个算法指定

c)     $dExtension$ – similar to dDocType .

$dExtension$ dDocType 相似。

 

StorageRules:

This table is used to describe the rules used for storing a content item’s files. The rule specifies which path expression to use for which storage class and it also determines it is to be stored and by what mechanism.

该表是用来描述存储文件所用到的所有的 storage rule 的。 Storage rule 会定义各种 storage class 路径表达式,并决定内容的存储方式和存储机制。

 

  • StorageRule – name of the rule. The rule’s name is computed via the dynamic include and stored in the content items metadata field xStorageRule.

 

  • StorageType – determines the storage implementation. Current accepted values are FileStorage and JdbcStorage. In FileStorage, the files are only stored on the file system. JdbcStorage the files by default are stored in the database.

 

  • IsWeblessStore - Used to specify if this is a system that allows webless files. When set to true, it is assumed by default that a newly created content item does not have a web-viewable file. In certain circumstances it is, however, desirable or necessary to insist on a web-viewable. Consequently, an argument in the calling code can be used to specify that a web file needs to be created. This information (if it has a web file or not) is stored in the xWebFlag metadata field.

 

  • RenditionsOnFileSystem – Used by JdbcStorage to determine if any files are to be stored on the file system instead of the database.

 

 

On upgrading the default file store provider, a default rule is created. Also, note that deleting or editing a storage rule may result in the system misplacing files.

在更新默认的 file store provider 时,一个默认的 rule 将被创建。另外要注意删除和编辑一个 storage rule 可能会使系统把文件存储到错误的位置。

URL parsing guidelines:

In the standard configuration, the URL contains security and dDocType information as well as the dDocName and extension. The URL and the web location is constructed as follows:

 

…/groups/$dSecurityGroup$/$ dDocAccount $/documents/$dDocType$/$dDocName$.$dWebExtension$

 

The ‘ groups ’ separator is an indication to the system that the directories that follow are the name of the security group the content item belongs to and the accounts. Note that the accounts are optional and consequently computed by an algorithm. After the security information, we have the ‘ documents ’ separator, which is immediately followed by the dDocType , i.e. content type. The last part of the URL above is the dDocName and its format.

 

Since the URL is expected in this format, the system can successfully extracted metadata from it. More importantly, it can determine the security information for the content item and derive the access privileges for particular user.

 

The parsing guidelines have been expanded to allow for dispersion in the web directory. We keep the ‘ groups ’ separator, but replace the ‘documents’ separator with ‘sg’. When the parse encounters the ‘ sg ’ separator, it no longer assumes that the remaining part of the URL is /$dDocType$/$dDocName$.$dWebExtension$ . Instead, the parser looks for the dispersion end marker ‘ d ’. Once the ‘ d ’ is encountered, the system assumes that the following information contains the dDocName and dWebExtension as before. This means that the system can now successfully parse URLs of the form

 

../groups/$dSecurityGroup$/$dDocAccount$/sg/<dispersion>/dispersion>…/d/$dDocName$.$dWebExtension$

 

Database Configuration:

The following configuration values are used to control when the file cache is to be cleaned up. Note that the system only cleans up files that have an entry in the FileCache table

 

FsCacheThreshold – The threshold of when the system starts deleting files that are older than the minimum age, as specified by the FsMinimumFileCacheAge parameter. The default unit is megabytes and is set to 100.

 

FsMaximumFileCacheAge – All files older than this are to be deleted. Default is 1 year. The default increment is days and is set to 365.

 

FsMinimumFileCacheAge – This parameter is used in conjunction with the FsCacheThreshold parameter to delete files.

 

Configuration Parameters:

We briefly discuss some of the configuration parameters and their locations.

 

In the intradoc.cfg, the following parameter may be specified:

 

StorageDir – set to a root directory to be used as the root directory for all partitions where the PartitionRoot column value has not been specified. In this case, the storage directory plus the partition name will be used to create the PartitionRoot parameter.

 

In the provider definition file, provider.hda, the following parameters and classes are standard for a file system store provider.

 

ProviderType=FileStore

ProviderClass=intradoc.filestore.BaseFileStore

IsPrimaryFileStore=true

 

# Configuration information specific to a file system store provider.

ProviderConfig=intradoc.filestore.filesystem.FileSystemProviderConfig

EventImplementor=intradoc.filestore.filesystem.FileSystemEventImplementor

DescriptorImplementor=intradoc.filestore.filesystem.FileSystemDescriptorImplementor

AccessImplementor=intradoc.filestore.filesystem.FileSystemAccessImplementor

 

Usage Examples:

In this section, we explicitly list the contents of the tables contained in the provider definition file for each of the examples. This may give the misleading impression that the administrator is required to edit the provider definition file manually. However, configuring the file store provider does not require manual editing of this file, since the system through the user interface creates all the tables necessary and provides sufficient defaults for most scenarios.

 

In most of our examples below, we use the following PathMetaData table and its definitions. Note the table has been trimmed of some it columns to reduce real-estate space and to provide a better presentation.

 

@ResultSet PathMetaData

6

FieldName

GenerationAlgorithm

RequiredForStorage

…<trimmed columns>

dID

 

#all

dDocName

 

#all

dDocAccount

documentAccount

 

dDocType

 

#all

dExtension

 

#all

dWebExtension

 

weburl

dSecurityGroup

 

#all

dRevisionID

 

#all

dReleaseState

 

#all

dStatus

 

web

xPartitionId

partitionSelection

 

ExtensionSeparator

extensionSeparator

 

xWebFlag

 

 

RenditionId

 

#all

RevisionLabel

revisionLabel

 

RenditionSpecifier

renditionSpecifier

 

@end

 

 
How to configure the component to use the usual or standard file paths:

The file system store provider can be configured to place the files in the standard locations. The first step is to define the storage rule. In this case, the storage rule will be of type FileStorage, since all the files are to be stored on the file system. Next, the path construction for each of the storage classes needs to be defined for the rule. In general, the tail end of the path should be standard for all usage examples, unless you are willing to limit the system’s functionality. For example, by using a non-standard filename, the system will not work well with hcs* files. However, the root path can be changed at will and should not affect functionality.

 

@ResultSet StorageRules

4

StorageRule

StorageType

IsWeblessStore

RenditionsOnFileSystem

default

FileStorage

 

 

@end@

 

@ResultSet PathConstruction

4

FileStore

PathExpression

AutoCreateLimit

IsWritable

StorageRule

vault

$#env.VaultDir$$dDocType$/$dDocAccount$/$dID$$ExtensionSeparator$$dExtension$

6

true

default

weburl

$HttpWebRoot$groups/$dSecurityGroup$/$dDocAccount$/documents/$dDocType$/$dDocName$$RenditionSpecifier$$RevisionLabel$$ExtensionSeparator$$dWebExtension$

3

false

default

web

$#env.WeblayoutDir$groups/$dSecurityGroup$/$dDocAccount$/documents/$dDocType$/$dDocName$$RenditionSpecifier$$RevisionLabel$$ExtensionSeparator$$dWebExtension$

3

true

default

@end

 

In this configuration, the vault, web and weburl storage classes need to be defined in the PathConstruction table. The path expression for the ‘vault’ has already been discussed. So we will only look at the path expression for web, which is quite similar to weburl in that it only differs in its root. That is the ‘web’ path is an absolute path on the file system, while the weburl is (as its name implies) a URL and served up by a web server.

 

The path expression of ‘web’ is defined to be

 

$#env.WeblayoutDir$/groups/$dSecurityGroup$/$dDocAccount$/documents/$dDocType$/$dDocName$$RenditionSpecifier$$ExtensionSeparator$$dWebExtension$

 

This is parsed into its component pieces and they are as follows:

 

  1. $#env.WeblayoutDir$ - look up in the shared environment for the value ‘WeblayoutDir’. This is defined by the content server to be the physical root path of the weblayout directory.

 

1a. (alternate for weburl) $HttpWebRoot$ - is an IdocScript variable.

 

  1. /groups/ - a string

 

  1. $dSecurityGroup$ - by the PathMetaData table this is a required field and must consequently be provided by the caller or descriptor creator. It is part of the content items metadata information.

 

  1. $dDocAccount$ - this is mapped to a documentAccount algorithm which takes dDocAccount and parses it into the standard content server account presentation with all the appropriate delimiters.

 

  1. /documents/ - a string

 

  1. $dDocType$ - same as dSecurityGroup above

 

  1. $dDocName$ - same as dSecurityGroup above

 

  1. $RenditionSpecifier$ - the rendition specifier is provided by the renditionSpecifier, which is only of interest if the system is creating additional renditions, e.g. thumbnails. Otherwise, this returns an empty string.

 

  1. $RevisionLabel$ - the revision label is provided by the revisionLabel algorithm, which depending on the status of the content item adds a ‘~dRevLabel’ to the path.

 

  1. $ExtensionSeparator$ - the extensionSeparator algorithm is used here and by default it just returns ‘.’.

 

  1. $dWebExtension$ - same as dWebExtension. The dWebExtension is a required field for the web and weburl storage classes and is passed in via the file parameters.

 

 

How to have a webless or optional web store:

The storage rule from above is now configured to have IsWeblessStore set to true and consequently the web-viewable file will not be created by default. However, if the document is processed through the IBR or WebForms or any other component that requires a web-viewable, the web file will be created. The location of the files is as above in the ‘standard’ configuration. However, since a file may not have a web rendition, the weburl path needs to be adjusted. Also, note the use of weburl.file. This is used to compute the URL when the web-viewable actually exists. The metadata field xWebFlag is used to determine how the file is to be served up in the browser.

 

@ResultSet StorageRules

4

StorageRule

StorageType

IsWeblessStore

RenditionsOnFileSystem

default

FileStorage

true

 

@end@

 

@ResultSet PathConstruction

4

FileStore

PathExpression

AutoCreateLimit

IsWritable

vault

$#env.VaultDir$$dDocType$/$dDocAccount$/$dID$$ExtensionSeparator$$dExtension$

6

true

default

weburl.file

$HttpWebRoot$groups/$dSecurityGroup$/$dDocAccount$/documents/$dDocType$/$dDocName$$RenditionSpecifier$$RevisionLabel$$ExtensionSeparator$$dWebExtension$

3

false

default

web

$#env.WeblayoutDir$groups/$dSecurityGroup$/$dDocAccount$/documents/$dDocType$/$dDocName$$RenditionSpecifier$$RevisionLabel$$ExtensionSeparator$$dWebExtension$

true

default

@end

 

How to configure the files to be stored in the database:

To store files in the database, we need a storage rule that is of type JdbcStorage. By default, all content items belonging to this rule have their files stored in the database. However, even though the files are stored in the database, there is the presumption of an underlying file system and the system may need to temporarily cache a file on the file system. In particular, this may happen for indexing or for some conversions.

 

Note: a rule can be configured to always store renditions belonging to a given storage class on the file system. This probably most useful for systems that want to store vault files in the database, but web files on the file system.

 

In the ‘default’ rule below, all files are stored in the database, while the ‘filesInWeb’ rule stores the vault files in the database and the web files on the file system. The path construction is as before.

 

@ResultSet StorageRules

4

StorageRule

StorageType

IsWeblessStore

RenditionsOnFileSystem

default

JdbcStorage

 

 

filesInWeb

JdbcStorage

 

web

@end@

 

@ResultSet PathConstruction

4

FileStore

PathExpression

AutoCreateLimit

IsWritable

StorageRule

vault

$#env.VaultDir$$dDocType$/$dDocAccount$/$dID$$ExtensionSeparator$$dExtension$

6

true

default

weburl.file

$HttpWebRoot$groups/$dSecurityGroup$/$dDocAccount$/documents/$dDocType$/$dDocName$$RenditionSpecifier$$RevisionLabel$$ExtensionSeparator$$dWebExtension$

3

true

default

web

$#env.WeblayoutDir$groups/$dSecurityGroup$/$dDocAccount$/documents/$dDocType$/$dDocName$$RenditionSpecifier$$RevisionLabel$$ExtensionSeparator$$dWebExtension$

3

true

default

@end

 

Altered paths and algorithms at work:

Up to now, the examples have kept the file paths to be consistent with the standard configuration. However, for very large systems this is likely to result in directory saturation. Below are some examples to aid in file dispersion.

到目前为止,上面的例子已经说明如何配置该组件。然而,一个较大的系统很容易造成目录的饱和而不能正确存储文件,下面将告诉大家如何分散放置文件。

How to use Partitioning:

如何使用 Partition

The file system store provider makes it easy to use partitions to create a sparser directory structure. By default, the xPartitionId metadata field is used and becomes a part of the revisions metadata information. It is recommended that this field is hidden from the UI and let the partition selection algorithm determine the partition to use. The partition selection algorithm looks at all the active partitions, and as a new content enters the system, the partitions are round robined. Each partition has an entry in the PartitionList table and can be declared active. The PartitionRoot is calculated from the xPartitionId, where the value is a look up key into the PartitionList table. If no xPartitionId is specified, the system finds the next available and active partition and uses this value for the location calculation. The xPartitionId is then stored as part of the content item’s metadata.

File storage provider 使得使用 partition 来创建目录非常的容易。默认情况下, xPartitionId 元数据字段会被使用并且成为内容元数据信息的一部分。建议要把 xPartition 元数据字段在用户的页面中隐藏,让 partition 选择算法来决定使用哪一个 partition 。当内容被捡入到服务器时, Partition Slection 算法将查找各个 partition ,它知道使用哪一个 partition 。每一个 partition 都在 PartitionList 表中定义,并且是被设置成 active 的。

 

To use the partition selection, define the vault storage class in the PathConstruction table as follows:

要使用 partition selection 算法的话,只要在 PathConstruction 表中像下面一样定义 vault storage class 即可:

vault

$PartitionRoot$/$dDocType$/$dDocAccount$/$dID$$ExtensionSeparator$$dExtension$

6

true

 

If at any point in time, the administrator feels a particular partition should now no longer be open to contribution, he should edit the partition (via the partition UI) to no longer be active, i.e. IsActive is false.

如果管理员觉得某一个 partition 已经太多内容了不能再打开该 partition 了,可以在编辑 partition 的页面中把该 partition IsActive 设置成 false 即可。

 

How to limit the number of files in a directory:

如何限制一个文件夹下的文件的个数

Another way of dispersing files is to alter the path so that files get partitioned out by the dID of the content item. In the example below, the directories are limited to 10,000 files plus extra files for additional renditions.

 

Note the dID[-12:-10:0] in the path expression. This is interpreted as follows: get the characters starting at 12 back from the end of the string until you get the character 10 back from the end of the string. Pad the resulting string to length 2, which 12-10, with 0 characters.

 

For example, if you path expression contains:

 

$dID[-12:-10:0]/$dID[-10:-8:0]$/$dID[-8:-4:0]$

 

And dID is 1234567890, the result is 00/12/3456

 

 

Additional Keywords:

file, database, webless, vaultless, storage, configuration, install, jdbc, file system

For More Information:

 

 

-----------------------------

Rev:  April 27, 2007

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值