Chapter 9 Batch compilation (ocamlc)
This chapter describes the OCaml batch compiler ocamlc, which compiles OCaml source files to bytecode object files and links these object files to produce standalone bytecode executable files. These executable files are then run by the bytecode interpreter ocamlrun.
9.1 Overview of the compiler
The ocamlc command has a command-line interface similar to the one of most C compilers. It accepts several types of arguments and processes them sequentially, after all options have been processed:
- Arguments ending in .mli are taken to be source files for compilation unit interfaces. Interfaces specify the names exported by compilation units: they declare value names with their types, define public data types, declare abstract data types, and so on. From the file x.mli, the ocamlc compiler produces a compiled interface in the file x.cmi.
- Arguments ending in .ml are taken to be source files for compilation unit implementations. Implementations provide definitions for the names exported by the unit, and also contain expressions to be evaluated for their side-effects. From the file x.ml, the ocamlc compiler produces compiled object bytecode in the file x.cmo.
If the interface file x.mli exists, the implementation x.ml is checked against the corresponding compiled interface x.cmi, which is assumed to exist. If no interface x.mli is provided, the compilation of x.ml produces a compiled interface file x.cmi in addition to the compiled object code file x.cmo. The file x.cmi produced corresponds to an interface that exports everything that is defined in the implementation x.ml.
- Arguments ending in .cmo are taken to be compiled object bytecode. These files are linked together, along with the object files obtained by compiling .ml arguments (if any), and the OCaml standard library, to produce a standalone executable program. The order in which .cmo and .ml arguments are presented on the command line is relevant: compilation units are initialized in that order at run-time, and it is a link-time error to use a component of a unit before having initialized it. Hence, a given x.cmo file must come before all .cmo files that refer to the unit x.
- Arguments ending in .cma are taken to be libraries of object bytecode. A library of object bytecode packs in a single file a set of object bytecode files (.cmo files). Libraries are built with ocamlc -a (see the description of the -a option below). The object files contained in the library are linked as regular .cmo files (see above), in the order specified when the .cma file was built. The only difference is that if an object file contained in a library is not referenced anywhere in the program, then it is not linked in.
- Arguments ending in .c are passed to the C compiler, which generates a .o object file (.obj under Windows). This object file is linked with the program if the -custom flag is set (see the description of -custom below).
- Arguments ending in .o or .a (.obj or .lib under Windows) are assumed to be C object files and libraries. They are passed to the C linker when linking in -custom mode (see the description of -custom below).
- Arguments ending in .so (.dll under Windows) are assumed to be C shared libraries (DLLs). During linking, they are searched for external C functions referenced from the OCaml code, and their names are written in the generated bytecode executable. The run-time system ocamlrun then loads them dynamically at program start-up time.
The output of the linking phase is a file containing compiled bytecode that can be executed by the OCaml bytecode interpreter: the command named ocamlrun. If a.out is the name of the file produced by the linking phase, the command
ocamlrun a.out arg1 arg2 … argn
executes the compiled code contained in a.out, passing it as arguments the character strings arg1 to argn. (See chapter 11 for more details.)
On most systems, the file produced by the linking phase can be run directly, as in:
./a.out arg1 arg2 … argn
The produced file has the executable bit set, and it manages to launch the bytecode interpreter by itself.
The compiler is able to emit some information on its internal stages. It can output .cmt files for the implementation of the compilation unit and .cmti for signatures if the option -bin-annot is passed to it (see the description of -bin-annot below). Each such file contains a typed abstract syntax tree (AST), that is produced during the type checking procedure. This tree contains all available information about the location and the specific type of each term in the source file. The AST is partial if type checking was unsuccessful.
These .cmt and .cmti files are typically useful for code inspection tools.
9.2 Options
The following command-line options are recognized by ocamlc. The options -pack, -a, -c and -output-obj are mutually exclusive.
- -a
- Build a library(.cma file) with the object files ( .cmo files) given on the command line, instead of linking them into an executable file. The name of the library must be set with the -o option.
If -custom, -cclib or -ccopt options are passed on the command line, these options are stored in the resulting .cmalibrary. Then, linking with this library automatically adds back the -custom, -cclib and -ccopt options as if they had been provided on the command line, unless the -noautolink option is given.
- -absname
- Force error messages to show absolute paths for file names.
- -annot
- Deprecated since OCaml 4.11. Please use -bin-annot instead.
- -args filename
- Read additional newline-terminated command line arguments from filename.
- -args0 filename
- Read additional null character terminated command line arguments from filename.
- -bin-annot
- Dump detailed information about the compilation (types, bindings, tail-calls, etc) in binary format. The information for file src.ml (resp. src.mli) is put into file src.cmt (resp. src.cmti). In case of a type error, dump all the information inferred by the type-checker before the error. The *.cmt and *.cmti files produced by -bin-annot contain more information and are much more compact than the files produced by -annot.
- -c
- Compile only. Suppress the linking phase of the compilation. Source code files are turned into compiled files, but no executable file is produced. This option is useful to compile modules separately.
- -cc ccomp
- Use ccomp as the C linker when linking in “custom runtime” mode (see the -custom option) and as the C compiler for compiling .c source files.
- -cclib -llibname
- Pass the -llibname option to the C linker when linking in “custom runtime” mode (see the -custom option). This causes the given C library to be linked with the program.
- -ccopt option
- Pass the given option to the C compiler and linker. When linking in “custom runtime” mode, for instance-ccopt -Ldir causes the C linker to search for C libraries in directory dir.(See the -custom option.)
- -color mode
- Enable or disable colors in compiler messages (especially warnings and errors). The following modes are supported:
- auto
- use heuristics to enable colors only if the output supports them (an ANSI-compatible tty terminal);
- always
- enable colors unconditionally;
- never
- disable color output.
The default setting is ’auto’, and the current heuristic checks that the TERM environment variable exists and is not empty or dumb, and that ’isatty(stderr)’ holds.The environment variable OCAML_COLOR is considered if -color is not provided. Its values are auto/always/never as above.
- -error-style mode
- Control the way error messages and warnings are printed. The following modes are supported:
- short
- only print the error and its location;
- contextual
- like short, but also display the source code snippet corresponding to the location of the error.
The default setting is contextual.The environment variable OCAML_ERROR_STYLE is considered if -error-style is not provided. Its values are short/contextual as above.
- -compat-32
- Check that the generated bytecode executable can run on 32-bit platforms and signal an error if it cannot. This is useful when compiling bytecode on a 64-bit machine.
- -config
- Print the version number of ocamlc and a detailed summary of its configuration, then exit.
- -config-var var
- Print the value of a specific configuration variable from the -config output, then exit. If the variable does not exist, the exit code is non-zero. This option is only available since OCaml 4.08, so script authors should have a fallback for older versions.
- -custom
- Link in “custom runtime” mode. In the default linking mode, the linker produces bytecode that is intended to be executed with the shared runtime system, ocamlrun. In the custom runtime mode, the linker produces an output file that contains both the runtime system and the bytecode for the program. The resulting file is larger, but it can be executed directly, even if the ocamlrun command is not installed. Moreover, the “custom runtime” mode enables static linking of OCaml code with user-defined C functions, as described in chapter 20.
Unix: Never use the strip command on executables produced by ocamlc -custom, this would remove the bytecode part of the executable.
Unix: Security warning: never set the “setuid” or “setgid” bits on executables produced by ocamlc -custom, this would make them vulnerable to attacks.
- -depend ocamldep-args
- Compute dependencies, as the ocamldep command would do. The remaining arguments are interpreted as if they were given to the ocamldep command.
- -dllib -llibname
- Arrange for the C shared library dlllibname.so (dlllibname.dll under Windows) to be loaded dynamically by the run-time system ocamlrun at program start-up time.
- -dllpath dir
- Adds the directory dir to the run-time search path for shared C libraries. At link-time, shared libraries are searched in the standard search path (the one corresponding to the -I option). The -dllpath option simply stores dir in the produced executable file, where ocamlrun can find it and use it as described in section 11.3.
- -for-pack module-path
- Generate an object file (.cmo) that can later be included as a sub-module (with the given access path) of a compilation unit constructed with -pack. For instance, ocamlc -for-pack P -c A.ml will generate a..cmo that can later be used with ocamlc -pack -o P.cmo a.cmo. Note: you can still pack a module that was compiled without -for-pack but in this case exceptions will be printed with the wrong names.
- -g
- Add debugging information while compiling and linking. This option is required in order to be able to debug the program with ocamldebug (see chapter 17), and to produce stack backtraces when the program terminates on an uncaught exception (see section 11.2).
- -i
- Cause the compiler to print all defined names (with their inferred types or their definitions) when compiling an implementation (.ml file). No compiled files (.cmo and .cmi files) are produced. This can be useful to check the types inferred by the compiler. Also, since the output follows the syntax of interfaces, it can help in writing an explicit interface (.mli file) for a file: just redirect the standard output of the compiler to a .mli file, and edit that file to remove all declarations of unexported names.
- -I directory
- Add the given directory to the list of directories searched for compiled interface files (.cmi), compiled object code files .cmo, libraries (.cma) and C libraries specified with -cclib -lxxx. By default, the current directory is searched first, then the standard library directory. Directories added with -I are searched after the current directory, in the order in which they were given on the command line, but before the standard library directory. See also option -nostdlib.
If the given directory starts with +, it is taken relative to the standard library directory. For instance, -I +unix adds the subdirectory unix of the standard library to the search path.
- -impl filename
- Compile the file filename as an implementation file, even if its extension is not .ml.
- -intf filename
- Compile the file filename as an interface file, even if its extension is not .mli.
- -intf-suffix string
- Recognize file names ending with string as interface files (instead of the default .mli).
- -labels
- Labels are not ignored in types, labels may be used in applications, and labelled parameters can be given in any order. This is the default.
- -linkall
- Force all modules contained in libraries to be linked in. If this flag is not given, unreferenced modules are not linked in. When building a library (option -a), setting the -linkall option forces all subsequent links of programs involving that library to link all the modules contained in the library. When compiling a module (option -c), setting the -linkall option ensures that this module will always be linked if it is put in a library and this library is linked.
- -make-runtime
- Build a custom runtime system (in the file specified by option -o) incorporating the C object files and libraries given on the command line. This custom runtime system can be used later to execute bytecode executables produced with the ocamlc -use-runtime runtime-name option. See section 20.1.6 for more information.
- -match-context-rows
- Set the number of rows of context used for optimization during pattern matching compilation. The default value is 32. Lower values cause faster compilation, but less optimized code. This advanced option is meant for use in the event that a pattern-match-heavy program leads to significant increases in compilation time.
- -no-alias-deps
- Do not record dependencies for module aliases. See section 8.8 for more information.
- -no-app-funct
- Deactivates the applicative behaviour of functors. With this option, each functor application generates new types in its result and applying the same functor twice to the same argument yields two incompatible structures.
- -noassert
- Do not compile assertion checks. Note that the special form assert false is always compiled because it is typed specially. This flag has no effect when linking already-compiled files.
- -noautolink
- When linking .cmalibraries, ignore -custom, -cclib and -ccopt options potentially contained in the libraries (if these options were given when building the libraries). This can be useful if a library contains incorrect specifications of C libraries or C options; in this case, during linking, set -noautolink and pass the correct C libraries and options on the command line.
- -nolabels
- Ignore non-optional labels in types. Labels cannot be used in applications, and parameter order becomes strict.
- -nostdlib
- Do not include the standard library directory in the list of directories searched for compiled interface files (.cmi), compiled object code files (.cmo), libraries (.cma), and C libraries specified with -cclib -lxxx. See also option -I.
- -o exec-file
- Specify the name of the output file produced by the compiler. The default output name is a.out under Unix and camlprog.exe under Windows. If the -a option is given, specify the name of the library produced. If the -pack option is given, specify the name of the packed object file produced. If the -output-obj option is given, specify the name of the output file produced. If the -c option is given, specify the name of the object file produced for the next source file that appears on the command line.
- -opaque
- When the native compiler compiles an implementation, by default it produces a .cmx file containing information for cross-module optimization. It also expects .cmx files to be present for the dependencies of the currently compiled source, and uses them for optimization. Since OCaml 4.03, the compiler will emit a warning if it is unable to locate the .cmx file of one of those dependencies.
The -opaque option, available since 4.04, disables cross-module optimization information for the currently compiled unit. When compiling .mli interface, using -opaque marks the compiled .cmi interface so that subsequent compilations of modules that depend on it will not rely on the corresponding .cmx file, nor warn if it is absent. When the native compiler compiles a .ml implementation, using -opaque generates a .cmx that does not contain any cross-module optimization information.
Using this option may degrade the quality of generated code, but it reduces compilation time, both on clean and incremental builds. Indeed, with the native compiler, when the implementation of a compilation unit changes, all the units that depend on it may need to be recompiled – because the cross-module information may have changed. If the compilation unit whose implementation changed was compiled with -opaque, no such recompilation needs to occur. This option can thus be used, for example, to get faster edit-compile-test feedback loops.
- -open Module
- Opens the given module before processing the interface or implementation files. If several -open options are given, they are processed in order, just as if the statements open! Module1;; ... open! ModuleN;; were added at the top of each file.
- -output-obj
- Cause the linker to produce a C object file instead of a bytecode executable file. This is useful to wrap OCaml code as a C library, callable from any C program. See chapter 20, section 20.7.5. The name of the output object file must be set with the -o option. This option can also be used to produce a C source file (.c extension) or a compiled shared/dynamic library (.so extension, .dll under Windows).
- -output-complete-exe
- Build a self-contained executable by linking a C object file containing the bytecode program, the OCaml runtime system and any other static C code given to ocamlc. The resulting effect is similar to -custom, except that the bytecode is embedded in the C code so it is no longer accessible to tools such as ocamldebug. On the other hand, the resulting binary is resistant to strip.
- -pack
- Build a bytecode object file (.cmo file) and its associated compiled interface (.cmi) that combines the object files given on the command line, making them appear as sub-modules of the output .cmo file. The name of the output .cmo file must be given with the -o option. For instance,
ocamlc -pack -o p.cmo a.cmo b.cmo c.cmo
generates compiled files p.cmo and p.cmi describing a compilation unit having three sub-modules A, B and C, corresponding to the contents of the object files a.cmo, b.cmo and c.cmo. These contents can be referenced as P.A, P.B and P.C in the remainder of the program.
- -pp command
- Cause the compiler to call the given command as a preprocessor for each source file. The output of command is redirected to an intermediate file, which is compiled. If there are no compilation errors, the intermediate file is deleted afterwards.
- -ppx command
- After parsing, pipe the abstract syntax tree through the preprocessor command. The module Ast_mapper, described in chapter 27: Ast_mapper , implements the external interface of a preprocessor.
- -principal
- Check information path during type-checking, to make sure that all types are derived in a principal way. When using labelled arguments and/or polymorphic methods, this flag is required to ensure future versions of the compiler will be able to infer types correctly, even if internal algorithms change. All programs accepted in -principal mode are also accepted in the default mode with equivalent types, but different binary signatures, and this may slow down type checking; yet it is a good idea to use it once before publishing source code.
- -rectypes
- Allow arbitrary recursive types during type-checking. By default, only recursive types where the recursion goes through an object type are supported.Note that once you have created an interface using this flag, you must use it again for all dependencies.
- -runtime-variant suffix
- Add the suffix string to the name of the runtime library used by the program. Currently, only one such suffix is supported: d, and only if the OCaml compiler was configured with option -with-debug-runtime. This suffix gives the debug version of the runtime, which is useful for debugging pointer problems in low-level code such as C stubs.
- -stop-after pass
- Stop compilation after the given compilation pass. The currently supported passes are: parsing, typing.
- -safe-string
- Enforce the separation between types string and bytes, thereby making strings read-only. This is the default.
- -short-paths
- When a type is visible under several module-paths, use the shortest one when printing the type’s name in inferred interfaces and error and warning messages. Identifier names starting with an underscore _ or containing double underscores __ incur a penalty of +10 when computing their length.
- -strict-sequence
- Force the left-hand part of each sequence to have type unit.
- -strict-formats
- Reject invalid formats that were accepted in legacy format implementations. You should use this flag to detect and fix such invalid formats, as they will be rejected by future OCaml versions.
- -unboxed-types
- When a type is unboxable (i.e. a record with a single argument or a concrete datatype with a single constructor of one argument) it will be unboxed unless annotated with [@@ocaml.boxed].
- -no-unboxed-types
- When a type is unboxable it will be boxed unless annotated with [@@ocaml.unboxed]. This is the default.
- -unsafe
- Turn bound checking off for array and string accesses (the v.(i) and s.[i] constructs). Programs compiled with -unsafe are therefore slightly faster, but unsafe: anything can happen if the program accesses an array or string outside of its bounds. Additionally, turn off the check for zero divisor in integer division and modulus operations. With -unsafe, an integer division (or modulus) by zero can halt the program or continue with an unspecified result instead of raising a Division_by_zero exception.
- -unsafe-string
- Identify the types string and bytes, thereby making strings writable. This is intended for compatibility with old source code and should not be used with new software.
- -use-runtime runtime-name
- Generate a bytecode executable file that can be executed on the custom runtime system runtime-name, built earlier with ocamlc -make-runtime runtime-name. See section 20.1.6 for more information.
- -v
- Print the version number of the compiler and the location of the standard library directory, then exit.
- -verbose
- Print all external commands before they are executed, in particular invocations of the C compiler and linker in -custom mode. Useful to debug C library problems.
- -version or -vnum
- Print the version number of the compiler in short form (e.g. 3.11.0), then exit.
- -w warning-list
- Enable, disable, or mark as fatal the warnings specified by the argument warning-list. Each warning can be enabled or disabled, and each warning can be fatal or non-fatal. If a warning is disabled, it isn’t displayed and doesn’t affect compilation in any way (even if it is fatal). If a warning is enabled, it is displayed normally by the compiler whenever the source code triggers it. If it is enabled and fatal, the compiler will also stop with an error after displaying it.
The warning-list argument is a sequence of warning specifiers, with no separators between them. A warning specifier is one of the following:
- +num
- Enable warning number num.
- -num
- Disable warning number num.
- @num
- Enable and mark as fatal warning number num.
- +num1..num2
- Enable warnings in the given range.
- -num1..num2
- Disable warnings in the given range.
- @num1..num2
- Enable and mark as fatal warnings in the given range.
- +letter
- Enable the set of warnings corresponding to letter. The letter may be uppercase or lowercase.
- -letter
- Disable the set of warnings corresponding to letter. The letter may be uppercase or lowercase.
- @letter
- Enable and mark as fatal the set of warnings corresponding to letter. The letter may be uppercase or lowercase.
- uppercase-letter
- Enable the set of warnings corresponding to uppercase-letter.
- lowercase-letter
- Disable the set of warnings corresponding to lowercase-letter.
Warning numbers and letters which are out of the range of warnings that are currently defined are ignored. The warnings are as follows.
- 1
- Suspicious-looking start-of-comment mark.
- 2
- Suspicious-looking end-of-comment mark.
- 3
- Deprecated synonym for the ’deprecated’ alert.
- 4
- Fragile pattern matching: matching that will remain complete even if additional constructors are added to one of the variant types matched.
- 5
- Partially applied function: expression whose result has function type and is ignored.
- 6
- Label omitted in function application.
- 7
- Method overridden.
- 8
- Partial match: missing cases in pattern-matching.
- 9
- Missing fields in a record pattern.
- 10
- Expression on the left-hand side of a sequence that doesn’t have type unit (and that is not a function, see warning number 5).
- 11
- Redundant case in a pattern matching (unused match case).
- 12
- Redundant sub-pattern in a pattern-matching.
- 13
- Instance variable overridden.
- 14
- Illegal backslash escape in a string constant.
- 15
- Private method made public implicitly.
- 16
- Unerasable optional argument.
- 17
- Undeclared virtual method.
- 18
- Non-principal type.
- 19
- Type without principality.
- 20
- Unused function argument.
- 21
- Non-returning statement.
- 22
- Preprocessor warning.
- 23
- Useless record with clause.
- 24
- Bad module name: the source file name is not a valid OCaml module name.
- 25
- Deprecated: now part of warning 8.
- 26
- Suspicious unused variable: unused variable that is bound with let or as, and doesn’t start with an underscore (_) character.
- 27
- Innocuous unused variable: unused variable that is not bound with let nor as, and doesn’t start with an underscore (_) character.
- 28
- Wildcard pattern given as argument to a constant constructor.
- 29
- Unescaped end-of-line in a string constant (non-portable code).
- 30
- Two labels or constructors of the same name are defined in two mutually recursive types.
- 31
- A module is linked twice in the same executable.
- 32
- Unused value declaration.
- 33
- Unused open statement.
- 34
- Unused type declaration.
- 35
- Unused for-loop index.
- 36
- Unused ancestor variable.
- 37
- Unused constructor.
- 38
- Unused extension constructor.
- 39
- Unused rec flag.
- 40
- Constructor or label name used out of scope.
- 41
- Ambiguous constructor or label name.
- 42
- Disambiguated constructor or label name (compatibility warning).
- 43
- Nonoptional label applied as optional.
- 44
- Open statement shadows an already defined identifier.
- 45
- Open statement shadows an already defined label or constructor.
- 46
- Error in environment variable.
- 47
- Illegal attribute payload.
- 48
- Implicit elimination of optional arguments.
- 49
- Absent cmi file when looking up module alias.
- 50
- Unexpected documentation comment.
- 51
- Warning on non-tail calls if @tailcall present.
-
52 (see 9.5.2)
- Fragile constant pattern.
- 53
- Attribute cannot appear in this context.
- 54
- Attribute used more than once on an expression.
- 55
- Inlining impossible.
- 56
- Unreachable case in a pattern-matching (based on type information).
-
57 (see 9.5.3)
- Ambiguous or-pattern variables under guard.
- 58
- Missing cmx file.
- 59
- Assignment to non-mutable value.
- 60
- Unused module declaration.
- 61
- Unboxable type in primitive declaration.
- 62
- Type constraint on GADT type declaration.
- 63
- Erroneous printed signature.
- 64
- -unsafe used with a preprocessor returning a syntax tree.
- 65
- Type declaration defining a new ’()’ constructor.
- 66
- Unused open! statement.
- 67
- Unused functor parameter.
- A
- all warnings
- C
- warnings 1, 2.
- D
- Alias for warning 3.
- E
- Alias for warning 4.
- F
- Alias for warning 5.
- K
- warnings 32, 33, 34, 35, 36, 37, 38, 39.
- L
- Alias for warning 6.
- M
- Alias for warning 7.
- P
- Alias for warning 8.
- R
- Alias for warning 9.
- S
- Alias for warning 10.
- U
- warnings 11, 12.
- V
- Alias for warning 13.
- X
- warnings 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 30.
- Y
- Alias for warning 26.
- Z
- Alias for warning 27.
The default setting is -w +a-4-6-7-9-27-29-32..42-44-45-48-50-60. It is displayed by ocamlc -help. Note that warnings 5 and 10 are not always triggered, depending on the internals of the type checker.
- -warn-error warning-list
- Mark as fatal the warnings specified in the argument warning-list. The compiler will stop with an error when one of these warnings is emitted. The warning-list has the same meaning as for the -w option: a + sign (or an uppercase letter) marks the corresponding warnings as fatal, a - sign (or a lowercase letter) turns them back into non-fatal warnings, and a @ sign both enables and marks as fatal the corresponding warnings.
Note: it is not recommended to use warning sets (i.e. letters) as arguments to -warn-error in production code, because this can break your build when future versions of OCaml add some new warnings.
The default setting is -warn-error -a+31 (only warning 31 is fatal).
- -warn-help
- Show the description of all available warning numbers.
- -where
- Print the location of the standard library, then exit.
- -with-runtime
- Include the runtime system in the generated program. This is the default.
- -without-runtime
- The compiler does not include the runtime system (nor a reference to it) in the generated program; it must be supplied separately.
- - file
- Process file as a file name, even if it starts with a dash (-) character.
- -help or --help
- Display a short usage summary and exit.
contextual-cli-control
Contextual control of command-line options
The compiler command line can be modified “from the outside” with the following mechanisms. These are experimental and subject to change. They should be used only for experimental and development work, not in released packages.
- OCAMLPARAM (environment variable)
- A set of arguments that will be inserted before or after the arguments from the command line. Arguments are specified in a comma-separated list of name=value pairs. A _ is used to specify the position of the command line arguments, i.e. a=x,_,b=y means that a=x should be executed before parsing the arguments, and b=y after. Finally, an alternative separator can be specified as the first character of the string, within the set :|; ,.
-
ocaml_compiler_internal_params (file in the stdlib directory)
- A mapping of file names to lists of arguments that will be added to the command line (and OCAMLPARAM) arguments.
-
OCAML_FLEXLINK (environment variable)
- Alternative executable to use on native Windows for flexlink instead of the configured value. Primarily used for bootstrapping.
9.3 Modules and the file system
This short section is intended to clarify the relationship between the names of the modules corresponding to compilation units and the names of the files that contain their compiled interface and compiled implementation.
The compiler always derives the module name by taking the capitalized base name of the source file (.ml or .mli file). That is, it strips the leading directory name, if any, as well as the .ml or .mli suffix; then, it set the first letter to uppercase, in order to comply with the requirement that module names must be capitalized. For instance, compiling the file mylib/misc.ml provides an implementation for the module named Misc. Other compilation units may refer to components defined in mylib/misc.ml under the names Misc.name; they can also do open Misc, then use unqualified names name.
The .cmi and .cmo files produced by the compiler have the same base name as the source file. Hence, the compiled files always have their base name equal (modulo capitalization of the first letter) to the name of the module they describe (for .cmi files) or implement (for .cmo files).
When the compiler encounters a reference to a free module identifier Mod, it looks in the search path for a file named Mod.cmi or mod.cmi and loads the compiled interface contained in that file. As a consequence, renaming .cmi files is not advised: the name of a .cmi file must always correspond to the name of the compilation unit it implements. It is admissible to move them to another directory, if their base name is preserved, and the correct -I options are given to the compiler. The compiler will flag an error if it loads a .cmi file that has been renamed.
Compiled bytecode files (.cmo files), on the other hand, can be freely renamed once created. That’s because the linker never attempts to find by itself the .cmo file that implements a module with a given name: it relies instead on the user providing the list of .cmo files by hand.
9.4 Common errors
This section describes and explains the most frequently encountered error messages.
- Cannot find file filename
- The named file could not be found in the current directory, nor in the directories of the search path. The filename is either a compiled interface file (.cmi file), or a compiled bytecode file (.cmo file). If filename has the format mod.cmi, this means you are trying to compile a file that references identifiers from module mod, but you have not yet compiled an interface for module mod. Fix: compile mod.mli or mod.ml first, to create the compiled interface mod.cmi.
If filename has the format mod.cmo, this means you are trying to link a bytecode object file that does not exist yet. Fix: compile mod.ml first.
If your program spans several directories, this error can also appear because you haven’t specified the directories to look into. Fix: add the correct -I options to the command line.
- Corrupted compiled interface filename
- The compiler produces this error when it tries to read a compiled interface file (.cmi file) that has the wrong structure. This means something went wrong when this .cmi file was written: the disk was full, the compiler was interrupted in the middle of the file creation, and so on. This error can also appear if a .cmi file is modified after its creation by the compiler. Fix: remove the corrupted .cmi file, and rebuild it.
-
This expression has type t1, but is used with type t2
- This is by far the most common type error in programs. Type t1 is the type inferred for the expression (the part of the program that is displayed in the error message), by looking at the expression itself. Type t2 is the type expected by the context of the expression; it is deduced by looking at how the value of this expression is used in the rest of the program. If the two types t1 and t2 are not compatible, then the error above is produced.
In some cases, it is hard to understand why the two types t1 and t2 are incompatible. For instance, the compiler can report that “expression of type foo cannot be used with type foo”, and it really seems that the two types foo are compatible. This is not always true. Two type constructors can have the same name, but actually represent different types. This can happen if a type constructor is redefined. Example:
type foo = A | B
let f = function A -> 0 | B -> 1
type foo = C | D
f C
This result in the error message “expression C of type foo cannot be used with type foo”.
- The type of this expression, t, contains type variables that cannot be generalized
- Type variables ('a, 'b, …) in a type t can be in either of two states: generalized (which means that the type t is valid for all possible instantiations of the variables) and not generalized (which means that the type t is valid only for one instantiation of the variables). In a let binding let name = expr, the type-checker normally generalizes as many type variables as possible in the type of expr. However, this leads to unsoundness (a well-typed program can crash) in conjunction with polymorphic mutable data structures. To avoid this, generalization is performed at let bindings only if the bound expression expr belongs to the class of “syntactic values”, which includes constants, identifiers, functions, tuples of syntactic values, etc. In all other cases (for instance, expr is a function application), a polymorphic mutable could have been created and generalization is therefore turned off for all variables occurring in contravariant or non-variant branches of the type. For instance, if the type of a non-value is 'a list the variable is generalizable (list is a covariant type constructor), but not in 'a list -> 'a list (the left branch of -> is contravariant) or 'a ref (ref is non-variant).
Non-generalized type variables in a type cause no difficulties inside a given structure or compilation unit (the contents of a .ml file, or an interactive session), but they cannot be allowed inside signatures nor in compiled interfaces (.cmi file), because they could be used inconsistently later. Therefore, the compiler flags an error when a structure or compilation unit defines a value name whose type contains non-generalized type variables. There are two ways to fix this error:
- Add a type constraint or a .mli file to give a monomorphic type (without type variables) to name. For instance, instead of writing
let sort_int_list = List.sort Stdlib.compare
(* inferred type 'a list -> 'a list, with 'a not generalized *)
write let sort_int_list = (List.sort Stdlib.compare : int list -> int list);;
- If you really need name to have a polymorphic type, turn its defining expression into a function by adding an extra parameter. For instance, instead of writing
let map_length = List.map Array.length
(* inferred type 'a array list -> int list, with 'a not generalized *)
write let map_length lv = List.map Array.length lv
- Reference to undefined global mod
- This error appears when trying to link an incomplete or incorrectly ordered set of files. Either you have forgotten to provide an implementation for the compilation unit named mod on the command line (typically, the file named mod.cmo, or a library containing that file). Fix: add the missing .ml or .cmo file to the command line. Or, you have provided an implementation for the module named mod, but it comes too late on the command line: the implementation of mod must come before all bytecode object files that reference mod. Fix: change the order of .ml and .cmo files on the command line.
Of course, you will always encounter this error if you have mutually recursive functions across modules. That is, function Mod1.f calls function Mod2.g, and function Mod2.g calls function Mod1.f. In this case, no matter what permutations you perform on the command line, the program will be rejected at link-time. Fixes:
- Put f and g in the same module.
- Parameterize one function by the other. That is, instead of having
mod1.ml: let f x = ... Mod2.g ...
mod2.ml: let g y = ... Mod1.f ...
define mod1.ml: let f g x = ... g ...
mod2.ml: let rec g y = ... Mod1.f g ...
and link mod1.cmo before mod2.cmo.
- Use a reference to hold one of the two functions, as in :
mod1.ml: let forward_g =
ref((fun x -> failwith "forward_g") : <type>)
let f x = ... !forward_g ...
mod2.ml: let g y = ... Mod1.f ...
let _ = Mod1.forward_g := g
- The external function f is not available
- This error appears when trying to link code that calls external functions written in C. As explained in chapter 20, such code must be linked with C libraries that implement the required f C function. If the C libraries in question are not shared libraries (DLLs), the code must be linked in “custom runtime” mode. Fix: add the required C libraries to the command line, and possibly the -custom option.
9.5 Warning reference
This section describes and explains in detail some warnings:
9.5.1 Warning 9: missing fields in a record pattern
When pattern matching on records, it can be useful to match only few fields of a record. Eliding fields can be done either implicitly or explicitly by ending the record pattern with ; _. However, implicit field elision is at odd with pattern matching exhaustiveness checks. Enabling warning 9 prioritizes exhaustiveness checks over the convenience of implicit field elision and will warn on implicit field elision in record patterns. In particular, this warning can help to spot exhaustive record pattern that may need to be updated after the addition of new fields to a record type.
type 'a point = {x : 'a; y : 'a}
let dx { x } = x (* implicit field elision: trigger warning 9 *)
let dy { y; _ } = y (* explicit field elision: do not trigger warning 9 *)
9.5.2 Warning 52: fragile constant pattern
Some constructors, such as the exception constructors Failure and Invalid_argument, take as parameter a string value holding a text message intended for the user.
These text messages are usually not stable over time: call sites building these constructors may refine the message in a future version to make it more explicit, etc. Therefore, it is dangerous to match over the precise value of the message. For example, until OCaml 4.02, Array.iter2 would raise the exception
Invalid_argument "arrays must have the same length"
Since 4.03 it raises the more helpful message
Invalid_argument "Array.iter2: arrays must have the same length"
but this means that any code of the form
try ...
with Invalid_argument "arrays must have the same length" -> ...
is now broken and may suffer from uncaught exceptions.
Warning 52 is there to prevent users from writing such fragile code in the first place. It does not occur on every matching on a literal string, but only in the case in which library authors expressed their intent to possibly change the constructor parameter value in the future, by using the attribute ocaml.warn_on_literal_pattern (see the manual section on builtin attributes in 8.12.1):
type t =
| Foo of string [@ocaml.warn_on_literal_pattern]
| Bar of string
let no_warning = function
| Bar "specific value" -> 0
| _ -> 1
let warning = function
| Foo "specific value" -> 0
| _ -> 1
> | Foo "specific value" -> 0
> ^^^^^^^^^^^^^^^^
> Warning 52: Code should not depend on the actual values of
> this constructor's arguments. They are only for information
> and may change in future versions. (See manual section 8.5)
In particular, all built-in exceptions with a string argument have this attribute set: Invalid_argument, Failure, Sys_error will all raise this warning if you match for a specific string argument.
Additionally, built-in exceptions with a structured argument that includes a string also have the attribute set: Assert_failure and Match_failure will raise the warning for a pattern that uses a literal string to match the first element of their tuple argument.
If your code raises this warning, you should not change the way you test for the specific string to avoid the warning (for example using a string equality inside the right-hand-side instead of a literal pattern), as your code would remain fragile. You should instead enlarge the scope of the pattern by matching on all possible values.
let warning = function
| Foo _ -> 0
| _ -> 1
This may require some care: if the scrutinee may return several different cases of the same pattern, or raise distinct instances of the same exception, you may need to modify your code to separate those several cases.
For example,
try (int_of_string count_str, bool_of_string choice_str) with
| Failure "int_of_string" -> (0, true)
| Failure "bool_of_string" -> (-1, false)
should be rewritten into more atomic tests. For example, using the exception patterns documented in Section 7.6, one can write:
match int_of_string count_str with
| exception (Failure _) -> (0, true)
| count ->
begin match bool_of_string choice_str with
| exception (Failure _) -> (-1, false)
| choice -> (count, choice)
end
The only case where that transformation is not possible is if a given function call may raise distinct exceptions with the same constructor but different string values. In this case, you will have to check for specific string values. This is dangerous API design and it should be discouraged: it’s better to define more precise exception constructors than store useful information in strings.
9.5.3 Warning 57: Ambiguous or-pattern variables under guard
The semantics of or-patterns in OCaml is specified with a left-to-right bias: a value v matches the pattern p | q if it matches p or q, but if it matches both, the environment captured by the match is the environment captured by p, never the one captured by q.
While this property is generally intuitive, there is at least one specific case where a different semantics might be expected. Consider a pattern followed by a when-guard: | p when g -> e, for example:
| ((Const x, _) | (_, Const x)) when is_neutral x -> branch
The semantics is clear: match the scrutinee against the pattern, if it matches, test the guard, and if the guard passes, take the branch. In particular, consider the input (Const a, Const b), where a fails the test is_neutral a, while b passes the test is_neutral b. With the left-to-right semantics, the clause above is not taken by its input: matching (Const a, Const b) against the or-pattern succeeds in the left branch, it returns the environment x -> a, and then the guard is_neutral a is tested and fails, the branch is not taken.
However, another semantics may be considered more natural here: any pair that has one side passing the test will take the branch. With this semantics the previous code fragment would be equivalent to
| (Const x, _) when is_neutral x -> branch
| (_, Const x) when is_neutral x -> branch
This is not the semantics adopted by OCaml.
Warning 57 is dedicated to these confusing cases where the specified left-to-right semantics is not equivalent to a non-deterministic semantics (any branch can be taken) relatively to a specific guard. More precisely, it warns when guard uses “ambiguous” variables, that are bound to different parts of the scrutinees by different sides of a or-pattern.