Now that you’ve successfully logged into Dataverse with a superuser account after going through a basic Installation, you’ll need to secure and configure your installation.
Settings within Dataverse itself are managed via JVM options or by manipulating values in the setting
table directly or through API calls.
Once you have finished securing and configuring your Dataverse installation, you may proceed to the Admin Guide for more information on the ongoing administration of a Dataverse installation. Advanced configuration topics are covered in the TwoRavens, Shibboleth and OAuth Login: ORCID, GitHub, Google sections.
Remember how under “Decisions to Make” in the Preparation section we mentioned you’ll need to make a decision about whether or not to introduce a proxy in front of Dataverse such as Apache or nginx? The time has come to make that decision.
The need to redirect port HTTP (port 80) to HTTPS (port 443) for security has already been mentioned above and the fact that Glassfish puts these services on 8080 and 8181, respectively, was touched on in the Installation section. In production, you don’t want to tell your users to use Dataverse on ports 8080 and 8181. You should have them use the standard HTTPS port, which is 443.
Your decision to proxy or not should primarily be driven by which features of Dataverse you’d like to use. If you’d like to use Shibboleth, the decision is easy because proxying or “fronting” Glassfish with Apache is required. The details are covered in the Shibboleth section.
If you’d like to use TwoRavens, you should also consider fronting with Apache because you will be required to install an Apache anyway to make use of the rApache module. For details, see the TwoRavens section.
Even if you have no interest in Shibboleth nor TwoRavens, you may want to front Dataverse with Apache or nginx to simply the process of installing SSL certificates. There are many tutorials on the Internet for adding certs to Apache, including a some notes used by the Dataverse team, but the process of adding a certificate to Glassfish is arduous and not for the faint of heart. The Dataverse team cannot provide much help with adding certificates to Glassfish beyond linking to tips on the web.
Still not convinced you should put Glassfish behind another web server? Even if you manage to get your SSL certificate into Glassfish, how are you going to run Glassfish on low ports such as 80 and 443? Are you going to run Glassfish as root? Bad idea. This is a security risk. Under “Additional Recommendations” under “Securing Your Installation” above you are advised to configure Glassfish to run as a user other than root. (The Dataverse team will close https://github.com/IQSS/dataverse/issues/1934 after updating the Glassfish init script provided in the Prerequisites section to not require root.)
There’s also the issue of serving a production-ready version of robots.txt. By using a proxy such as Apache, this is a one-time “set it and forget it” step as explained below in the “Going Live” section.
If you are convinced you’d like to try fronting Glassfish with Apache, the Shibboleth section should be good resource for you.
If you really don’t want to front Glassfish with any proxy (not recommended), you can configure Glassfish to run HTTPS on port 443 like this:
./asadmin set server-config.network-config.network-listeners.network-listener.http-listener-2.port=443
What about port 80? Even if you don’t front Dataverse with Apache, you may want to let Apache run on port 80 just to rewrite HTTP to HTTPS as described above. You can use a similar command as above to change the HTTP port that Glassfish uses from 8080 to 80 (substitute http-listener-1.port=80
). Glassfish can be used to enforce HTTPS on its own without Apache, but configuring this is an exercise for the reader. Answers here may be helpful: http://stackoverflow.com/questions/25122025/glassfish-v4-java-7-port-unification-error-not-able-to-redirect-http-to
If you are running an installation with Apache and Glassfish on the same server, and would like to restrict Glassfish from responding to any requests to port 8080 from external hosts (in other words, not through Apache), you can restrict the AJP listener to localhost only with:
./asadmin set server-config.network-config.network-listeners.network-listener.http-listener-1.address=127.0.0.1
You should NOT use the configuration option above if you are running in a load-balanced environment, or otherwise have the web server on a different host than the application server.
By default, a Dataverse installation stores data files (files uploaded by end users) on the filesystem at /usr/local/glassfish4/glassfish/domains/domain1/files
but this path can vary based on answers you gave to the installer (see the Running the Dataverse Installer section of the Installation Guide) or afterward by reconfiguring the dataverse.files.directory
JVM option described below.
Rather than storing data files on the filesystem, you can opt for an experimental setup with a Swift Object Storage backend. Each dataset that users create gets a corresponding “container” on the Swift side, and each data file is saved as a file within that container.
In order to configure a Swift installation, there are two steps you need to complete:
First, create a file named swift.properties
as follows in the config
directory for your installation of Glassfish (by default, this would be /usr/local/glassfish4/glassfish/domains/domain1/config/swift.properties
):
swift.default.endpoint=endpoint1
swift.auth_type.endpoint1=your-authentication-type
swift.auth_url.endpoint1=your-auth-url
swift.tenant.endpoint1=your-tenant-name
swift.username.endpoint1=your-username
swift.password.endpoint1=your-password
swift.swift_endpoint.endpoint1=your-swift-endpoint
auth_type
can either be keystone
, keystone_v3
, or it will assumed to be basic
. auth_url
should be your keystone authentication URL which includes the tokens (e.g. for keystone, https://openstack.example.edu:35357/v2.0/tokens
and for keystone_v3, https://openstack.example.edu:35357/v3/auth/tokens
). swift_endpoint
is a URL that look something like http://rdgw.swift.example.org/swift/v1
.
Second, update the JVM option dataverse.files.storage-driver-id
by running the delete command:
./asadmin $ASADMIN_OPTS delete-jvm-options "\-Ddataverse.files.storage-driver-id=file"
Then run the create command:
./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files.storage-driver-id=swift"
You also have the option to set a custom container name separator. It is initialized to _
, but you can change it by running the create command:
./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files.swift-folder-path-separator=-"
By default, your Swift installation will be public-only, meaning users will be unable to put access restrictions on their data. If you are comfortable with this level of privacy, the final step in your setup is to set the :PublicInstall setting to true.
In order to enable file access restrictions, you must enable Swift to use temporary URLs for file access. To enable usage of temporary URLs, set a hash key both on your swift endpoint and in your swift.properties file. You can do so by adding
swift.hash_key.endpoint1=your-hash-key
to your swift.properties file.
You also have the option to set a custom expiration length for a generated temporary URL. It is initialized to 60 seconds, but you can change it by running the create command:
./asadmin $ASADMIN_OPTS create-jvm-options "\-Ddataverse.files.temp_url_expire=3600"
In this example, you would be setting the expiration length for one hour.
Once you have configured a Swift Object Storage backend, you also have the option of enabling a connection to a computing environment. To do so, you need to configure the database settings for :ComputeBaseUrl and :CloudEnvironmentName.
Once you have set up :ComputeBaseUrl
properly in both Dataverse and your cloud environment, validated users will have three options for accessing the computing environment:
- Compute on a single dataset
- Compute on multiple datasets
- Compute on a single datafile
The compute buttons on dataset and file pages will link validated users to your computing environment. If a user is computing on one dataset, the compute button will redirect to:
:ComputeBaseUrl?datasetPersistentId
If a user is computing on multiple datasets, the compute button will redirect to:
:ComputeBaseUrl/multiparty?datasetPersistentId&anotherDatasetPersistentId&anotherDatasetPersistentId&...
If a user is computing on a single file, depending on the configuration of your installation, the compute button will either redirect to:
:ComputeBaseUrl?datasetPersistentId=yourObject
if your installation’s :PublicInstall setting is true, or:
:ComputeBaseUrl?datasetPersistentId=yourObject&temp_url_sig=yourTempUrlSig&temp_url_expires=yourTempUrlExpiry
You can configure this redirect properly in your cloud environment to generate a temporary URL for access to the Swift objects for computing.
For institutions and organizations looking to use some kind of S3-based object storage for files uploaded to Dataverse,
this is entirely possible. You can either use Amazon Web Services or use some other, even on-site S3-compatible
storage (like Minio, Ceph RADOS S3 Gateway and many more).
Note: The Dataverse Team is most familiar with AWS S3, and can provide support on its usage with Dataverse. Thanks to community contributions, the application’s architecture also allows non-AWS S3 providers. The Dataverse Team can provide very limited support on these other providers. We recommend reaching out to the wider Dataverse community if you have questions.
Dataverse and the AWS SDK make use of the “AWS credentials profile file” and “AWS config profile file” located in
~/.aws/
where ~
is the home directory of the user you run Glassfish as. This file can be generated via either
of two methods described below:
- Manually through creation of the credentials and config files or
- Automatically via the AWS console commands.
Preparation When Using Amazon’s S3 Service
You’ll need an AWS account with an associated S3 bucket for your installation to use. From the S3 management console
(e.g. https://console.aws.amazon.com/), you can poke around and get familiar with your bucket.
Make note of the bucket’s name and the region its data is hosted in.
To create a user with full S3 access and nothing more for security reasons, we recommend using IAM
(Identity and Access Management). See IAM User Guide
for more info on this process.
Generate the user keys needed for Dataverse afterwards by clicking on the created user.
(You can skip this step when running on EC2, see below.)
Preparation When Using Custom S3-Compatible Service
We assume you have your S3-compatible custom storage in place, up and running, ready for service.
Please make note of the following details:
Endpoint URL - consult the documentation of your service on how to find it.
Region: Optional, but some services might use it. Consult your service documentation.
Access key ID and secret access key: Usually you can generate access keys within the user profile of your service.
- Example:
- ID: Q3AM3UQ867SPQQA43P2F
- Key: zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG
Bucket name: Dataverse will fail opening and uploading files on S3 if you don’t create one.
Manually Set Up Credentials File
To create the ~/.aws/credentials
file manually, you will need to generate a key/secret key (see above). Once you have
acquired the keys, they need to be added to the credentials
file. The format for credentials is as follows:
[default]
aws_access_key_id = <insert key, no brackets>
aws_secret_access_key = <insert secret key, no brackets>
While using Amazon’s service, you must also specify the AWS region in the ~/.aws/config
file, for example:
[default]
region = us-east-1
Place these two files in a folder named .aws
under the home directory for the user running your Dataverse Glassfish
instance. (From the AWS Command Line Interface Documentation:
“In order to separate credentials from less sensitive options, region and output format are stored in a separate file
named config in the same folder”)
Console Commands to Set Up Access Configuration
Begin by installing the CLI tool pip to install the
AWS command line interface if you don’t have it.
First, we’ll get our access keys set up. If you already have your access keys configured, skip this step.
From the command line, run:
pip install awscli
aws configure
You’ll be prompted to enter your Access Key ID and secret key, which should be issued to your AWS account.
The subsequent config steps after the access keys are up to you. For reference, the keys will be stored in
~/.aws/credentials
, and your AWS access region in ~/.aws/config
.
- TIP: When using a custom S3 URL endpoint, you need to add it to every
aws
call: aws --endpoint-url <URL> s3 ...
- (you may omit it while configuring).
With access to your bucket in place, we’ll want to navigate to /usr/local/glassfish4/glassfish/bin/
and execute the following asadmin
commands to set up the proper JVM options. Recall that out of the box, Dataverse
is configured to use local file storage. You’ll need to delete the existing storage driver before setting the new one.
./asadmin $ASADMIN_OPTS delete-jvm-options "-Ddataverse.files.storage-driver-id=file"
./asadmin $ASADMIN_OPTS create-jvm-options "-Ddataverse.files.storage-driver-id=s3"
Then, we’ll need to identify which S3 bucket we’re using. Replace your_bucket_name
with, of course, your bucket:
./asadmin create-jvm-options "-Ddataverse.files.s3-bucket-name=your_bucket_name"
Optionally, you can have users download files from S3 directly rather than having files pass from S3 through Glassfish to your users. To accomplish this, set dataverse.files.s3-download-redirect
to true
like this:
./asadmin create-jvm-options "-Ddataverse.files.s3-download-redirect=true"
If you enable dataverse.files.s3-download-redirect
as described above, note that the S3 URLs expire after an hour by default but you can configure the expiration time using the dataverse.files.s3-url-expiration-minutes
JVM option. Here’s an example of setting the expiration time to 120 minutes:
./asadmin create-jvm-options "-Ddataverse.files.s3-url-expiration-minutes=120"
In case you would like to configure Dataverse to use a custom S3 service instead of Amazon S3 services, please
add the options for the custom URL and region as documented below. Please read above if your desired combination has
been tested already and what other options have been set for a successful integration.
Lastly, go ahead and restart your glassfish server. With Dataverse deployed and the site online, you should be able to upload datasets and data files and see the corresponding files in your S3 bucket. Within a bucket, the folder structure emulates that found in local file storage.
S3 Storage Options
JVM Option |
Value |
Description |
Default value |
dataverse.files.storage-driver-id |
s3 |
Enable S3 storage driver. |
file |
dataverse.files.s3-bucket-name |
<?> |
The bucket name. See above. |
(none) |
dataverse.files.s3-download-redirect |
true /false |
Enable direct download or proxy through Dataverse. |
false |
dataverse.files.s3-url-expiration-minutes |
<?> |
If direct downloads: time until links expire. Optional. |
60 |
dataverse.files.s3-custom-endpoint-url |
<?> |
Use custom S3 endpoint. Needs URL either with or without protocol. |
(none) |
dataverse.files.s3-custom-endpoint-region |
<?> |
Only used when using custom endpoint. Optional. |
dataverse |
dataverse.files.s3-path-style-access |
true /false |
Use path style buckets instead of subdomains. Optional. |
false |
JVM stands Java Virtual Machine and as a Java application, Glassfish can read JVM options when it is started. A number of JVM options are configured by the installer below is a complete list of the Dataverse-specific JVM options. You can inspect the configured options by running:
./asadmin list-jvm-options | egrep 'dataverse|doi'
When changing values these values with asadmin
, you’ll need to delete the old value before adding a new one, like this:
./asadmin delete-jvm-options "-Ddataverse.fqdn=old.example.com"
./asadmin create-jvm-options "-Ddataverse.fqdn=dataverse.example.com"
It’s also possible to change these values by stopping Glassfish, editing glassfish4/glassfish/domains/domain1/config/domain.xml
, and restarting Glassfish.
If the Dataverse server has multiple DNS names, this option specifies the one to be used as the “official” host name. For example, you may want to have dataverse.example.edu, and not the less appealing server-123.socsci.example.edu to appear exclusively in all the registered global identifiers, Data Deposit API records, etc.
The password reset feature requires dataverse.fqdn
to be configured.
Do note that whenever the system needs to form a service URL, by default, it will be formed with https://
and port 443. I.e.,
https://{dataverse.fqdn}/
If that does not suit your setup, you can define an additional option, dataverse.siteUrl
, explained below.
and specify the protocol and port number you would prefer to be used to advertise the URL for your Dataverse.
For example, configured in domain.xml:
<jvm-options>-Ddataverse.fqdn=dataverse.example.edu</jvm-options>
<jvm-options>-Ddataverse.siteUrl=http://${dataverse.fqdn}:8080</jvm-options>
This is how you configure the path to which files uploaded by users are stored.
Users have 60 minutes to change their passwords by default. You can adjust this value here.
Dropbox provides a Chooser app, which is a Javascript component that allows you to upload files to Dataverse from Dropbox. It is an optional configuration setting, which requires you to pass it an app key and configure the :UploadMethods
database setting. For more information on setting up your Chooser app, visit https://www.dropbox.com/developers/chooser.
./asadmin create-jvm-options "-Ddataverse.dropbox.key={{YOUR_APP_KEY}}"
For overriding the default path to the convert
binary from ImageMagick (/usr/bin/convert
).
For limiting the size (in bytes) of thumbnail images generated from files.
For limiting the size (in bytes) of thumbnail images generated from files.
As of this writing, “https://mds.datacite.org” (DataCite) and “https://ezid.cdlib.org” (EZID) are the main valid values.
While the above two options are recommended because they have been tested by the Dataverse team, it is also possible to use a DataCite Client API as a proxy to DataCite. In this case, requests made to the Client API are captured and passed on to DataCite for processing. The application will interact with the DataCite Client API exactly as if it were interacting directly with the DataCite API, with the only difference being the change to the base endpoint URL.
For example, the Australian Data Archive (ADA) successfully uses the Australian National Data Service (ANDS) API (a proxy for DataCite) to mint their DOIs through Dataverse using a doi.baseurlstring
value of “https://researchdata.ands.org.au/api/doi/datacite” as documented at https://documentation.ands.org.au/display/DOC/ANDS+DataCite+Client+API . As ADA did for ANDS DOI minting, any DOI provider (and their corresponding DOI configuration parameters) other than DataCite must be tested with Dataverse to establish whether or not it will function properly.
Out of the box, Dataverse is configured to use a test MDS DataCite base URL string. You can delete it like this:
./asadmin delete-jvm-options '-Ddoi.baseurlstring=https\://mds.test.datacite.org'
Then, to switch to production DataCite, you can issue the following command:
./asadmin create-jvm-options '-Ddoi.baseurlstring=https\://mds.datacite.org'
See also these related database settings below:
Used in conjuction with doi.baseurlstring
.
Once you have a username from your provider, you can enter it like this:
./asadmin create-jvm-options '-Ddoi.username=YOUR_USERNAME_HERE'
Used in conjuction with doi.baseurlstring
.
Once you have a password from your provider, you can enter it like this:
./asadmin create-jvm-options '-Ddoi.password=YOUR_PASSWORD_HERE'
If you’re using handles, this JVM setting configures access credentials so your dataverse can talk to your Handle.Net server. This is the private key generated during Handle.Net server installation. Typically the full path is set to handle/svr_1/admpriv.bin
. Please refer to Handle.Net’s documentation for more info.
This JVM setting is also part of handles configuration. The Handle.Net installer lets you choose whether to encrypt the admcredfile private key or not. If you do encrypt it, this is the pass phrase that it’s encrypted with.
If you want to use different index than the default 300
This JVM option is only relevant if you plan to run multiple Glassfish servers for redundancy. Only one Glassfish server can act as the dedicated timer server and for details on promoting or demoting a Glassfish server to handle this responsibility, see Dataverse Application Timers.
This JVM option is used to configure the path where all the language specific property files are to be stored. If this option is set then the english property file must be present in the path along with any other language property file.
./asadmin create-jvm-options '-Ddataverse.lang.directory=PATH_LOCATION_HERE'
If this value is not set, by default, a Dataverse installation will read the English language property files from the Java Application.
Please note that this setting is experimental.
By default, download URLs to files will be included in Schema.org JSON-LD output. To prevent these URLs from being included in the output, set dataverse.files.hide-schema-dot-org-download-urls
to true as in the example below.
./asadmin create-jvm-options '-Ddataverse.files.hide-schema-dot-org-download-urls=true'
Please note that there are other reasons why download URLs may not be included for certain files such as if a guestbook entry is required or if the file is restricted.
For more on Schema.org JSON-LD, see the Metadata Export section of the Admin Guide.
These settings are stored in the setting
database table but can be read and modified via the “admin” endpoint of the Native API for easy scripting.
The most commonly used configuration options are listed first.
The pattern you will observe in curl examples below is that an HTTP PUT
is used to add or modify a setting. If you perform an HTTP GET
(the default when using curl), the output will contain the value of the setting, if it has been set. You can also do a GET
of all settings with curl http://localhost:8080/api/admin/settings
which you may want to pretty-print by piping the output through a tool such as jq by appending | jq .
. If you want to remove a setting, use an HTTP DELETE
such as curl -X DELETE http://localhost:8080/api/admin/settings/:GuidesBaseUrl
.
Out of the box, all API endpoints are completely open, as mentioned in the section on security above. It is highly recommended that you choose one of the policies below and also configure :BlockedApiEndpoints
.
- localhost-only: Allow from localhost.
- unblock-key: Require a key defined in
:BlockedApiKey
.
- drop: Disallow the blocked endpoints completely.
curl -X PUT -d localhost-only http://localhost:8080/api/admin/settings/:BlockedApiPolicy
A comma separated list of API endpoints to be blocked. For a production installation, “admin” should be blocked (and perhaps “builtin-users” as well), as mentioned in the section on security above:
curl -X PUT -d "admin,builtin-users" http://localhost:8080/api/admin/settings/:BlockedApiEndpoints
See the API Guide for a list of API endpoints.
Used in conjunction with the :BlockedApiPolicy
being set to unblock-key
. When calling blocked APIs, add a query parameter of unblock-key=theKeyYouChose
to use the key.
curl -X PUT -d s3kretKey http://localhost:8080/api/admin/settings/:BlockedApiKey
The key required to create users via API as documented at Native API. Unlike other database settings, this one doesn’t start with a colon.
curl -X PUT -d builtInS3kretKey http://localhost:8080/api/admin/settings/BuiltinUsers.KEY
In Dataverse 4.7 and lower, the Search API required an API token, but as of Dataverse 4.7.1 this is no longer the case. If you prefer the old behavior of requiring API tokens to use the Search API, set :SearchApiRequiresToken
to true
.
curl -X PUT -d true http://localhost:8080/api/admin/settings/:SearchApiRequiresToken
This is the email address that “system” emails are sent from such as password reset links. Your Dataverse installation will not send mail without this setting in place.
curl -X PUT -d 'LibraScholar SWAT Team <support@librascholar.edu>' http://localhost:8080/api/admin/settings/:SystemEmail
Note that only the email address is required, which you can supply without the <
and >
signs, but if you include the text, it’s the way to customize the name of your support team, which appears in the “from” address in emails as well as in help text in the UI.
Please note that if you’re having any trouble sending email, you can refer to “Troubleshooting” under Installation.
As of this writing “DataCite” and “EZID” are the only valid options for production installations. Developers are welcome to use “FAKE”. :DoiProvider
is only needed if you are using DOI.
curl -X PUT -d DataCite http://localhost:8080/api/admin/settings/:DoiProvider
This setting relates to the :Protocol
, :Authority
, :Shoulder
, and :IdentifierGenerationStyle
database settings below as well as the following JVM options:
As of this writing “doi” and “hdl” are the only valid option for the protocol for a persistent ID.
curl -X PUT -d doi http://localhost:8080/api/admin/settings/:Protocol
Use the authority assigned to you by your DoiProvider or HandleProvider.
Please note that the authority cannot have a slash (“/”) in it.
curl -X PUT -d 10.xxxx http://localhost:8080/api/admin/settings/:Authority
Out of the box, the DOI shoulder is set to “FK2/” but this is for testing only! When you apply for your DOI namespace, you may have requested a shoulder. The following is only an example and a trailing slash is optional.
curl -X PUT -d "MyShoulder/" http://localhost:8080/api/admin/settings/:Shoulder
By default, Dataverse generates a random 6 character string, pre-pended by the Shoulder if set, to use as the identifier
for a Dataset. Set this to sequentialNumber
to use sequential numeric values
instead (again pre-pended by the Shoulder if set). (the assumed default setting is randomString
).
In addition to this setting, a database sequence must be created in the database.
We provide the script below (downloadable here
).
You may need to make some changes to suit your system setup, see the comments for more information:
-- A script for creating a numeric identifier sequence, and an external
-- stored procedure, for accessing the sequence from inside the application,
-- in a non-hacky, JPA way.
-- NOTE:
-- 1. The database user name "dvnapp" is hard-coded here - it may
-- need to be changed to match your database user name;
-- 2. In the code below, the sequence starts with 1, but it can be adjusted by
-- changing the MINVALUE as needed.
CREATE SEQUENCE datasetidentifier_seq
INCREMENT 1
MINVALUE 1
MAXVALUE 9223372036854775807
START 1
CACHE 1;
ALTER TABLE datasetidentifier_seq OWNER TO "dvnapp";
-- And now create a PostgreSQL FUNCTION, for JPA to
-- access as a NamedStoredProcedure:
CREATE OR REPLACE FUNCTION generateIdentifierAsSequentialNumber(
OUT identifier int)
RETURNS int AS
$BODY$
BEGIN
select nextval('datasetidentifier_seq') into identifier;
END;
$BODY$
LANGUAGE plpgsql;
Note that the SQL above is Postgres-specific. If necessary, it can be reimplemented
in any other SQL flavor - the standard JPA code in the application simply expects
the database to have a saved function (“stored procedure”) named generateIdentifierAsSequentialNumber
with the single return argument identifier
.
For systems using Postgresql 8.4 or older, the procedural language plpgsql should be enabled first.
We have provided an example here
.
Please note that :IdentifierGenerationStyle
also plays a role for the “identifier” for files. See the section on :DataFilePIDFormat
below for more details.
This setting controls the way that the “identifier” component of a file’s persistent identifier (PID) relates to the PID of its “parent” dataset.
By default the identifier for a file is dependent on its parent dataset. For example, if the identifier of a dataset is “TJCLKP”, the identifier for a file within that dataset will consist of the parent dataset’s identifier followed by a slash (“/”), followed by a random 6 character string, yielding “TJCLKP/MLGWJO”. Identifiers in this format are what you should expect if you leave :DataFilePIDFormat
undefined or set it to DEPENDENT
and have not changed the :IdentifierGenerationStyle
setting from its default.
Alternatively, the identifier for File PIDs can be configured to be independent of Dataset PIDs using the setting “INDEPENDENT
”. In this case, file PIDs will not contain the PIDs of their parent datasets, and their PIDs will be generated the exact same way that datasets’ PIDs are, based on the :IdentifierGenerationStyle
setting described above (random 6 character strings or sequential numbers, pre-pended by any shoulder).
The chart below shows examples from each possible combination of parameters from the two settings. :IdentifierGenerationStyle
can be either randomString
(the default) or sequentialNumber
and :DataFilePIDFormat
can be either DEPENDENT
(the default) or INDEPENDENT
. In the examples below the “identifier” for the dataset is “TJCLKP” for “randomString” and “100001” for “sequentialNumber”.
|
randomString |
sequentialNumber |
DEPENDENT |
TJCLKP/MLGWJO |
100001/1 |
INDEPENDENT |
MLGWJO |
100002 |
As seen above, in cases where :IdentifierGenerationStyle
is set to sequentialNumber and :DataFilePIDFormat
is set to DEPENDENT, each file within a dataset will be assigned a number within that dataset starting with “1”.
Otherwise, if :DataFilePIDFormat
is set to INDEPENDENT, then each file will be assigned a PID with the next number in the overall sequence, regardless of what dataset it is in. If the file is created after a dataset with the PID 100001, then the file will be assigned the PID 100002. This option is functional, but it is not a recommended use case.
Note that in either case, when using the sequentialNumber
option, datasets and files share the same database sequence that was created as part of the setup described in :IdentifierGenerationStyle
above.
Toggles publishing of file-based PIDs for the entire installation. By default this setting is absent and Dataverse assumes it to be true.
If you don’t want to register file-based PIDs for your installation, set:
curl -X PUT -d 'false' http://localhost:8080/api/admin/settings/:FilePIDsEnabled
Note: File-level PID registration was added in 4.9 and is required until version 4.9.3.
Upload an HTML file containing the Terms of Use to be displayed at sign up. Supported HTML tags are listed under the Dataset + File Management section of the User Guide.
curl -X PUT -d@/tmp/apptou.html http://localhost:8080/api/admin/settings/:ApplicationTermsOfUse
Unfortunately, in most cases, the text file will probably be too big to upload (>1024 characters) due to a bug. A workaround has been posted to https://github.com/IQSS/dataverse/issues/2669
Specify a URL where users can read your Privacy Policy, linked from the bottom of the page.
curl -X PUT -d https://dataverse.org/best-practices/harvard-dataverse-privacy-policy http://localhost:8080/api/admin/settings/:ApplicationPrivacyPolicyUrl
Specify a URL where users can read your API Terms of Use.
API users can retrieve this URL from the SWORD Service Document or the “info” section of our Native API documentation.
curl -X PUT -d https://dataverse.org/best-practices/harvard-api-tou http://localhost:8080/api/admin/settings/:ApiTermsOfUse
Set :ExcludeEmailFromExport
to prevent email addresses for dataset contacts from being exposed in XML or JSON representations of dataset metadata. For a list exported formats such as DDI, see the Metadata Export section of the Admin Guide.
curl -X PUT -d true http://localhost:8080/api/admin/settings/:ExcludeEmailFromExport
Set NavbarAboutUrl
to a fully-qualified URL which will be used for the “About” link in the navbar.
Note: The “About” link will not appear in the navbar until this option is set.
curl -X PUT -d http://dataverse.example.edu http://localhost:8080/api/admin/settings/:NavbarAboutUrl
Set GuidesBaseUrl
to override the default value “http://guides.dataverse.org”. If you are interested in writing your own version of the guides, you may find the Documentation section of the Developer Guide helpful.
curl -X PUT -d http://dataverse.example.edu http://localhost:8080/api/admin/settings/:GuidesBaseUrl
Set :NavbarSupportUrl
to a fully-qualified URL which will be used for the “Support” link in the navbar.
Note that this will override the default behaviour for the “Support” menu option, which is to display the dataverse ‘feedback’ dialog.
curl -X PUT -d http://dataverse.example.edu/supportpage.html http://localhost:8080/api/admin/settings/:NavbarSupportUrl
Make the metrics component on the root dataverse a clickable link to a website where you present metrics on your Dataverse installation. This could perhaps be an installation of https://github.com/IQSS/miniverse or any site.
curl -X PUT -d http://metrics.dataverse.example.edu http://localhost:8080/api/admin/settings/:MetricsUrl
Alongside the :StatusMessageHeader
you need to add StatusMessageText for the message to show.:
curl -X PUT -d "This appears in a popup." http://localhost:8080/api/admin/settings/:StatusMessageText
Set MaxFileUploadSizeInBytes to “2147483648”, for example, to limit the size of files uploaded to 2 GB.
Notes:
- For SWORD, this size is limited by the Java Integer.MAX_VALUE of 2,147,483,647. (see: https://github.com/IQSS/dataverse/issues/2169)
- If the MaxFileUploadSizeInBytes is NOT set, uploads, including SWORD may be of unlimited size.
- For larger file upload sizes, you may need to configure your reverse proxy timeout. If using apache2 (httpd) with Shibboleth, add a timeout to the ProxyPass defined in etc/httpd/conf.d/ssl.conf (which is described in the Shibboleth setup).
curl -X PUT -d 2147483648 http://localhost:8080/api/admin/settings/:MaxFileUploadSizeInBytes
For performance reasons, Dataverse will only create zip files on the fly up to 100 MB but the limit can be increased. Here’s an example of raising the limit to 1 GB:
curl -X PUT -d 1000000000 http://localhost:8080/api/admin/settings/:ZipDownloadLimit
Threshold in bytes for limiting whether or not “ingest” it attempted for tabular files (which can be resource intensive). For example, with the below in place, files greater than 2 GB in size will not go through the ingest process:
curl -X PUT -d 2000000000 http://localhost:8080/api/admin/settings/:TabularIngestSizeLimit
(You can set this value to 0 to prevent files from being ingested at all.)
You can override this global setting on a per-format basis for the following formats:
- DTA
- POR
- SAV
- Rdata
- CSV
- XLSX
For example, if you want your installation of Dataverse to not attempt to ingest Rdata files larger that 1 MB, use this setting:
curl -X PUT -d 1000000 http://localhost:8080/api/admin/settings/:TabularIngestSizeLimit:Rdata
Limit the number of files in a zip that Dataverse will accept.
Set your Google Analytics Tracking ID thusly:
curl -X PUT -d 'trackingID' http://localhost:8080/api/admin/settings/:GoogleAnalyticsCode
By default Dataverse will attempt to connect to Solr on port 8983 on localhost. Use this setting to change the hostname or port. You must restart Glassfish after making this change.
curl -X PUT -d localhost:8983 http://localhost:8080/api/admin/settings/:SolrHostColonPort
Whether or not to index the content of files such as PDFs. The default is false.
curl -X PUT -d true http://localhost:8080/api/admin/settings/:SolrFullTextIndexing
If :SolrFullTextIndexing
is set to true, the content of files of any size will be indexed. To set a limit in bytes for which files to index in this way:
curl -X PUT -d 314572800 http://localhost:8080/api/admin/settings/:SolrMaxFileSizeForFullTextIndexing
The relative path URL to which users will be sent for signup. The default setting is below.
curl -X PUT -d '/dataverseuser.xhtml?editMode=CREATE' http://localhost:8080/api/admin/settings/:SignUpUrl
Set GeoconnectCreateEditMaps
to true to allow the user to create GeoConnect Maps. This boolean effects whether the user sees the map button on the dataset page and if the ingest will create a shape file.
curl -X PUT -d true http://localhost:8080/api/admin/settings/:GeoconnectCreateEditMaps
Set GeoconnectViewMaps
to true to allow a user to view existing maps. This boolean effects whether a user will see the “Explore” button.
curl -X PUT -d true http://localhost:8080/api/admin/settings/:GeoconnectViewMaps
Allow for migration of non-conformant data (especially dates) from DVN 3.x to Dataverse 4.
The duration in minutes before “Confirm Email” URLs expire. The default is 1440 minutes (24 hours). See also the User Administration section of our Admin Guide.
If you have enabled Shibboleth and/or one or more OAuth providers, you may wish to make one of these authentication providers the default when users visit the Log In page. If unset, this will default to builtin
but these valid options (depending if you’ve done the setup described in the Shibboleth or OAuth Login: ORCID, GitHub, Google sections) are:
builtin
shib
orcid
github
google
Here is an example of setting the default auth provider back to builtin
:
curl -X PUT -d builtin http://localhost:8080/api/admin/settings/:DefaultAuthProvider
Site identifier created in your Piwik instance. Example:
curl -X PUT -d 42 http://localhost:8080/api/admin/settings/:PiwikAnalyticsId
Host FQDN or URL of your Piwik instance before the /piwik.php
. Examples:
curl -X PUT -d stats.domain.tld http://localhost:8080/api/admin/settings/:PiwikAnalyticsHost
or
curl -X PUT -d hostname.domain.tld/stats http://localhost:8080/api/admin/settings/:PiwikAnalyticsHost
Filename for the ‘php’ and ‘js’ tracker files used in the Piwik code (piwik.php and piwik.js).
Sometimes these files are renamed in order to prevent ad-blockers (in the browser) to block the Piwik tracking code.
This sets the base name (without dot and extension), if not set it defaults to ‘piwik’.
curl -X PUT -d domainstats http://localhost:8080/api/admin/settings/:PiwikAnalyticsTrackerFileName
Dataverse calculates checksums for uploaded files so that users can determine if their file was corrupted via upload or download. This is sometimes called “file fixity”: https://en.wikipedia.org/wiki/File_Fixity
The default checksum algorithm used is MD5 and should be sufficient for establishing file fixity. “SHA-1”, “SHA-256” and “SHA-512” are alternate values for this setting. For example:
curl -X PUT -d 'SHA-512' http://localhost:8080/api/admin/settings/:FileFixityChecksumAlgorithm
The fixity algorithm used on existing files can be changed by a superuser using the API. An optional query parameter (num) can be used to limit the number of updates attempted.
The API call will only update the algorithm and checksum for a file if the existing checksum can be validated against the file.
Statistics concerning the updates are returned in the response to the API call with details in the log.
curl http://localhost:8080/api/admin/updateHashValues/{alg}
curl http://localhost:8080/api/admin/updateHashValues/{alg}?num=1
Password policy setting for builtin user accounts: a password’s minimum valid character length. The default is 6.
curl -X PUT -d 6 http://localhost:8080/api/admin/settings/:PVMinLength
Password policy setting for builtin user accounts: a password’s maximum valid character length.
curl -X PUT -d 0 http://localhost:8080/api/admin/settings/:PVMaxLength
By default, passwords can contain an unlimited number of digits in a row. However, if your password policy specifies otherwise (e.g. only four digits in a row are allowed), then you can issue the following curl command to set the number of consecutive digits allowed (this example uses 4):
curl -X PUT -d 4 http://localhost:8080/api/admin/settings/:PVNumberOfConsecutiveDigitsAllowed
Password policy setting for builtinuser accounts: dictates which types of characters can be required in a password. This setting goes hand-in-hand with :PVNumberOfCharacteristics. The default setting contains two rules:
The default setting above is equivalent to specifying “Alphabetical:1,Digit:1”.
By specifying “UpperCase:1,LowerCase:1,Digit:1,Special:1”, for example, you can put the following four rules in place instead:
- one uppercase letter
- one lowercase letter
- one digit
- one special character
If you have implemented 4 different character rules in this way, you can also optionally increase :PVNumberOfCharacteristics
to as high as 4. However, please note that :PVNumberOfCharacteristics
cannot be set to a number higher than the number of character rules or you will see the error, “Number of characteristics must be <= to the number of rules”.
Also note that the Alphabetical setting should not be used in tandem with the UpperCase or LowerCase settings. The Alphabetical setting encompasses both of those more specific settings, so using it with them will cause your password policy to be unnecessarily confusing, and potentially easier to bypass.
curl -X PUT -d 'UpperCase:1,LowerCase:1,Digit:1,Special:1' http://localhost:8080/api/admin/settings/:PVCharacterRules
curl -X PUT -d 3 http://localhost:8080/api/admin/settings/:PVNumberOfCharacteristics
Password policy setting for builtin user accounts: the number indicates how many of the character rules defined by :PVCharacterRules
are required as part of a password. The default is 2. :PVNumberOfCharacteristics
cannot be set to a number higher than the number of rules or you will see the error, “Number of characteristics must be <= to the number of rules”.
curl -X PUT -d 2 http://localhost:8080/api/admin/settings/:PVNumberOfCharacteristics
Password policy setting for builtin user accounts: set a comma separated list of dictionaries containing words that cannot be used in a user password. /usr/share/dict/words
is suggested and shown modified below to not contain words 3 letters or less. You are free to choose a different dictionary. By default, no dictionary is checked.
DIR=THE_PATH_YOU_WANT_YOUR_DICTIONARY_TO_RESIDE
sed '/^.\{,3\}$/d' /usr/share/dict/words > $DIR/pwdictionary
curl -X PUT -d "$DIR/pwdictionary" http://localhost:8080/api/admin/settings/:PVDictionaries
Password policy setting for builtin user accounts: passwords of equal or greater character length than the :PVGoodStrength setting are always valid, regardless of other password constraints.
curl -X PUT -d 20 http://localhost:8080/api/admin/settings/:PVGoodStrength
Recommended setting: 20.
Changes the default info message displayed when a user is required to change their password on login. The default is:
{0} Reset Password{1} – Our password requirements have changed. Please pick a strong password that matches the criteria below.
Where the {0} and {1} denote surrounding HTML bold tags. It’s recommended to put a single space before your custom message for better appearance (as in the default message above). Including the {0} and {1} to bolden part of your message is optional.
Customize the message using the following curl command’s syntax:
curl -X PUT -d '{0} Action Required:{1} Your current password does not meet all requirements. Please enter a new password meeting the criteria below.' http://localhost:8080/api/admin/settings/:PVCustomPasswordResetAlertMessage
Set :ShibPassiveLoginEnabled
to true to enable passive login for Shibboleth. When this feature is enabled, an additional Javascript file (isPassive.js) will be loaded for every page. It will generate a passive login request to your Shibboleth SP when an anonymous user navigates to the site. A cookie named “_check_is_passive_dv” will be created to keep track of whether or not a passive login request has already been made for the user.
This implementation follows the example on the Shibboleth wiki documentation page for the isPassive feature: https://wiki.shibboleth.net/confluence/display/SHIB2/isPassive
It is recommended that you configure additional error handling for your Service Provider if you enable passive login. A good way of doing this is described in the Shibboleth wiki documentation:
- In your Service Provider 2.x shibboleth2.xml file, add redirectErrors=”#THIS PAGE#” to the Errors element.
You can set the value of “#THIS PAGE#” to the URL of your Dataverse homepage, or any other page on your site that is accessible to anonymous users and will have the isPassive.js file loaded.
curl -X PUT -d true http://localhost:8080/api/admin/settings/:ShibPassiveLoginEnabled
Set the base URL for the “Compute” button for a dataset.
curl -X PUT -d 'https://giji.massopencloud.org/application/dataverse' http://localhost:8080/api/admin/settings/:ComputeBaseUrl
Set the name of the cloud environment you’ve integrated with your Dataverse installation.
curl -X PUT -d 'Massachusetts Open Cloud (MOC)' http://localhost:8080/api/admin/settings/:CloudEnvironmentName
Setting an installation to public will remove the ability to restrict data files or datasets. This functionality of Dataverse will be disabled from your installation.
This is useful for specific cases where an installation’s files are stored in public access. Because files stored this way do not obey Dataverse’s file restrictions, users would still be able to access the files even when they’re restricted. In these cases it’s best to use :PublicInstall to disable the feature altogether.
curl -X PUT -d true http://localhost:8080/api/admin/settings/:PublicInstall
The URL for your Data Capture Module (DCM) installation. This component is experimental and can be downloaded from https://github.com/sbgrid/data-capture-module .
curl -X PUT -d 'https://dcm.example.edu' http://localhost:8080/api/admin/settings/:DataCaptureModuleUrl
The URL for your Repository Storage Abstraction Layer (RSAL) installation. This component is experimental and can be downloaded from https://github.com/sbgrid/rsal .
curl -X PUT -d 'https://rsal.example.edu' http://localhost:8080/api/admin/settings/:RepositoryStorageAbstractionLayerUrl
This setting controls which upload methods are available to users of your installation of Dataverse. The following upload methods are available:
native/http
: Corresponds to “Upload with HTTP via your browser” and APIs that use HTTP (SWORD and native).
dcm/rsync+ssh
: Corresponds to “Upload with rsync+ssh via Data Capture Module (DCM)”. A lot of setup is required, as explained in the Big Data Support section of the Dev Guide.
Out of the box only native/http
is enabled and will work without further configuration. To add multiple upload method, separate them using a comma like this:
curl -X PUT -d 'native/http,dcm/rsync+ssh' http://localhost:8080/api/admin/settings/:UploadMethods
You’ll always want at least one upload method, so the easiest way to remove one of them is to simply PUT
just the one you want, like this:
curl -X PUT -d 'native/http' http://localhost:8080/api/admin/settings/:UploadMethods
This setting is experimental and related to Repository Storage Abstraction Layer (RSAL).
curl -X PUT -d 'rsal/rsync' http://localhost:8080/api/admin/settings/:DownloadMethods
Limit on how many guestbook entries to display on the guestbook-responses page. By default, only the 5000 most recent entries will be shown. Use the standard settings API in order to change the limit. For example, to set it to 10,000, make the following API call:
curl -X PUT -d 10000 http://localhost:8080/api/admin/settings/:GuestbookResponsesPageDisplayLimit
You can replace the default dataset metadata fields that are displayed above files table on the dataset page with a custom list separated by commas using the curl command below.
curl http://localhost:8080/api/admin/settings/:CustomDatasetSummaryFields -X PUT -d 'producer,subtitle,alternativeTitle'
You have to put the datasetFieldType name attribute in the :CustomDatasetSummaryFields setting for this to work.
Dataverse 4.8.1 and below allowed API Token lookup via API but for better security this has been disabled by default. Set this to true if you really want the old behavior.
curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:AllowApiTokenLookupViaApi
Enable the collection of provenance metadata on Dataverse via the provenance popup.
curl -X PUT -d 'true' http://localhost:8080/api/admin/settings/:ProvCollectionEnabled
Sets how long a cached metrics result is used before re-running the query for a request. This timeout is only applied to some of the metrics that query the current state of the system, previous months queries are cached indefinitely. See Metrics API for more info. The default timeout value is 7 days (10080 minutes).
curl -X PUT -d 10080 http://localhost:8080/api/admin/settings/:MetricsCacheTimeoutMinutes
Sets which languages should be available. If there is more than one, a dropdown is displayed
in the header. This should be formated as a JSON array as shown below.
curl http://localhost:8080/api/admin/settings/:Languages -X PUT -d '[{ "locale":"en", "title":"English"}, { "locale":"fr", "title":"Français"}]'
:InheritParentRoleAssignments
can be set to a comma-separated list of role aliases or ‘*’ (all) to cause newly created Dataverses to inherit the set of users and/or internal groups who have assignments for those role(s) on the parent Dataverse, i.e. those users/groups will be assigned the same role(s) on the new Dataverse (in addition to the creator of the new Dataverse having an admin role).
This can be helpful in situations where multiple organizations are sharing one Dataverse instance. The default, if ::InheritParentRoleAssignments
is not set is for the creator of the new Dataverse to be the only one assigned a role.
curl -X PUT -d 'admin, curator' http://localhost:8080/api/admin/settings/:InheritParentRoleAssignments
or
curl -X PUT -d '*' http://localhost:8080/api/admin/settings/:InheritParentRoleAssignments