What's the technical definition for “routine”? - definition

I'm studying lisp language (to do lisp routines) and in a general context i know what's a routine, but in a technical context i can talk about it, because i'm starting to learn routines now. So, what's the real definition of routine?
(i've already "googled" this but didn't find anything)

The term routine derives from subroutine, which is a more common term in languages like BASIC where one actually creates SUBroutines. (BASIC actually had a difference between a SUBroutine and a FUNCTION, but nevertheless...)
From the Wikipedia entry:
In computer science, a subroutine (also called procedure, function, routine, method, or subprogram) is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code.
As the name "subprogram" suggests, a subroutine behaves in much the same way as a computer program that is used as one step in a larger program or another subprogram. A subroutine is often coded so that it can be started ("called") several times and/or from several places during a single execution of the program, including from other subroutines, and then branch back (return) to the next instruction after the "call" once the subroutine's task is done.
Different languages/environments/eras have different ecosystems and thus different terms to describe the same general concept. I generally only use the term function (or method in an "OOP" environment) these days.
Happy coding.
For fun I have Community Wiki'ed. The list below is hopefully to cover which term(s) is (are) "correct" (widely accepted) to use in a given language to mean routine. Informally routine is used in context of all the languages below so it should be omitted unless it is the defacto term used. Feel free to add, correct, and annotate as appropriate.
C - function
Java - method. While function is also often used, the term function does not appear in the Java Language Specification.
C# - method and function. In the specification, functions refer to function-objects and anonymous functions. They are not the same as methods, which are members of types (classes or structures). Also consider delegates.
JavaScript - function or method. Methods are functions accessed via a property of an object.
Haskell - function. This is the accepted terminology.
Scala - function or method. Method if def member of type, functions are first-class values.
BASIC - function or subroutine. Subroutines do not return values. Supports call-by-reference.
FORTRAN - function or subroutine. Subroutines do not return values. Supports call-by-reference.
LISP - function. DEFUN -> DEfineFUNction, all forms are valid expressions. Also consider macros, which are not themselves functions but are arguably routines.
VHDL - subprograms: functions and procedures. Procedures have no return value.
SmallTalk - method
Python - method
Ruby - method (often interchanged with function? lambdas/Procs may be considered different?)
Perl - function and subroutine. There is only one form to declare a function/SUBroutine so there is no distinction w.r.t. return values. Using method (for object-bound functions) seems less prevalent than in other languages.
Pascal - procedures and functions
Ada - procedures and functions

You can't find a technical definition because there isn't a technical definition specific to lisp. A 'routine', outside of vaudeville, is just another name for a function. While it's been many years since I programmed in Lisp full-time, no one ever used that term in any formal way, or even used it commonly. We talked about 'functions', 'macros', and 'forms.' If someone said, 'oh, there's a routine to calculate how many apples in a pie' it was perfectly informal.


How does a modern compiler know an operation will not have sideffects?

I have been reading compilers will optimize for operations without side effects.
I assume this is done very conservatively, but how does the compiler know.
Does it have a look up table of side effect free operations or does it do it otherwise?
It performs a static analysis of the code during compilation.
Remember, it knows if the code only does local calculations and doesn't do pointer math or uses globals or modifies data that gets passed in by reference somehow. Any functions it calls must also be non-side-effect(recurse this).
The programmer might also give the compiler some hints: see for example noalias.
Side effects are primarily a potential result of function calls.
In some cases, the possible side effects are encoded into parameter declarations. C even has a special qualifier called restrict, which, in addition to const and volatile gives direct information to the compiler about possible side effects.
So, if control passes outside the immediate expression being evaluated, via method or function calls, the compiler needs to either statically analyze the entire code path or make a determination based on the formal parameters.
In the worst case, the compiler may need to assume that anything in memory may have changed following a method or function call.
To arrange for the best possible optimization for your code, in computation-intensive areas of code use features of the language to make parameters as restricted as possible. Copy globals that are not modified by expressions to locals before use; this helps the compiler in several ways. Restrict the visibility of all objects as much as possible. (Use locals, then instance variables, then class variables, avoid global anything.)
Simplistically, there is a list of side-effect free (aka pure) operations. But most operations are only kinda side effect free. For example, in Java, reading from an array is pure if you can statically tell that its in bounds. Many other analyses, such as Value Range Propagation, can help with this.
Typically, you want to tell if a function call is pure. For builtin functions, the compiler might just have a list (although, a function modelled by the compiler, such as sin in C, might be side-effect free in some cases, and might modify errno in others.
To tell if a function is pure, you must know that it doesn't write to memory which escapes - that is, global variables, class variables, or any memory accessible through the return value or parameters. This is called mod-ref analysis (aka alias analysis or pointer analysis, and the same basic idea as escape analysis). A simplified algorithm is:
collect all names (global x, y->f, *y, **y etc) in the program
construct a graph of names which may point to the same memory location
when a name is written to (ie *x = 5;) get a list of the other names which might also b written to
if one of those escapes from the function, then the function is not pure.
This isn't the greatest explanation, I think, suggestions and questions welcome.
Note that some languages, like C++, provide annotations for purity (a const function, not to be confused with pure virtual functions), and even allows some advanced features like mutable. I've been told that they aren't particularly useful for optimization though, but I've not written a C++ compiler so I don't know why. I would speculate that separate compilation ruins their fun, perhaps.

Safety via external formal proofs

Safety is commonly implemented via strong static typing. Although there are very powerful type systems (dependent typing), none of them are powerful enough to express arbitrary formal proofs about code. Another problem is that type systems are tightly coupled with a single programming language, inhibit formal proofs refactoring.
One possible framework I am thinking is a compiler that activate optimizations when programmer (or an automated tool) provides formal proofs that allows them. Examples are uniqueness, termination, array bounds checking, memory management, safety, etc.
Is there any programming language that implements in some way this concept?
I am aware of proof carrying code, but it is normally implemented as a conventional type system and a compiler that proves type safety under program transformation.
The code generator for the Isabelle/HOL proof assistant is based on the principle you describe. One can specify an abstract relation that is given in a declarative way, meaning that there is no effective algorithm to check that it holds for arbitrary input (though it might be possible to prove it for some specific inputs). Then one defines a function to check that the relation holds or not for arbitrary values. This includes some steps to ensure that the function is computable, e.g. to show that it terminates. Next, one proves that the relation and the function can indeed be used in place of each other. Finally, with this proof, it becomes possible to tell Isabelle that the function can be used to generate code (in one of supported functional languages) for the original predicate.
Of course, if there are two such functions, a preferred one can be selected for code generation. This can also be viewed as an optimization that produces provably identical results.

How can I compile VM-specific code into x86 machine code?

I have a 16-bit register-based virtual machine, I want to know what are the steps of compiling it to actual x86 machine code? I'm not looking to make a JIT compiler unless it is necessary to be able to link the compiled code with another executable / DLL.
The VM is made such that if the VM is added to a project, special language constructs can be added. (for example, if it is embedded into a game engine, an "Entity" object type may be added, and several C functions from the engine might be exposed.) This will cause the code to be completely dependent on certain exposed C functions or exposed C++ classes, in the application it's embedded into.
How would this sort of "linking" be possible if the script code is compiled from VM bytecode into a native EXE?
It is also register-based like Lua's VM, as in all basic variables are stored in "registers" which is a huge C array. A register-pointer is incremented or decremented when the scope changes, so register numbers are relative, similar to stack pointer. E.g.:
int a = 5;
int a = 1;
might be, in virtual machine pseudo-assembly:
mov_int (%r0, $5)
; new scope, the "register pointer" is then incremented by the number
; of bytes that are used to store local variables in this new scope. E.g. int = 4 bytes
; say $rp is the "register pointer"
add (%rp, $4) ; since size of int is usually 4 bytes
; this is if registers are 1 bytes in size, if they were
; 4 bytes in size it would just be adding $1
mov_int (%r0, $1) ; now each register "index" is offset by 4,
; this is now technically setting %r4
; different instructions are used to get values above current scope
sub (%rp, $4) ; end of scope so reset %rp
My question about this part is, would I have to use the stack pointer for this sort of thing? The base pointer? What could I use to replace this concept?
The VM is made such that if the VM is added to a project, special language constructs can be added. (for example, if it is embedded into a game engine, an "Entity" object type may be added, and several C functions from the engine might be exposed.) This will cause the code to be completely dependent on certain exposed C functions or exposed C++ classes, in the application it's embedded into.
There's many ways to implement this kind of cross-language interfacing. Whether you're running VM bytecode or native machinecode isn't going to matter much here unless you need an interface with very low overhead. The main consideration is the nature of your language — especially whether it has static or dynamic typing.
Generally speaking these are the two most common approaches (you may already be familiar with them):
(a) The 'foreign-function-interface' approach, where your language/runtime offers facilities to automatically wrap functions and data from C. Examples include LuaJIT FFI, js-ctypes and P/Invoke. Most FFIs can operate on CDECL/STDCALL functions and POD structures; some have varying levels of support for C++ or COM classes.
(b) The 'runtime-API' approach, where your runtime exposes a C API you can use to manually construct/manipulate objects for use in your language. Lua has an extensive API for this (example) as does Python.
How would this sort of "linking" be possible if the script code is compiled from VM bytecode into a native EXE?
So you're probably thinking about how to e.g. bake foreign function addresses into your generated machinecode. Well, if you have the proper FFI infrastructure in place there's no reason you can't do this, as long as you're aware of how shared-library imports work (import address tables, relocation, fixups, etc.).
If you don't know much about shared-libraries, I think by spending some time researching that area you'll start to get a much clearer idea of the ways you can implement FFI in your compiler.
However if it would probably be easier to take a slightly more dynamic approach, e.g.: LoadLibrary(), GetProcAddress(), then wrap the function pointer as an object of your language.
It's unfortunately very hard to give more specific suggestions without knowing anything about the language/VM in question.
[…] My question about this part is, would I have to use the stack pointer for this sort of thing? The base pointer? What could I use to replace this concept?
I'm not entirely sure what the purpose of this 'register array' scheme is.
In a language with lexical scoping, it's my understanding that when compiling a function you typically enumerate every variable declared in its body and allocate a block of stack space large enough to hold all the variables (ignoring the complex topic of CPU register allocation). The code can address these variables using the stack pointer or (more often) the base pointer.
If a variable in an inner scope shadows a variable in an outer scope like your example, they're assigned separate memory spaces on the stack — because as far as the compiler is concerned they are different variables.
Without understanding the rationale behind whatever scheme the VM is using I can't really suggest how it should translate to machinecode. Maybe someone with more experience programming bytecode compilers can give you the answer you're after.
However it may be that your VM's approach is actually similar to what I've described, in which case adapting it for machinecode compilation should actually be very straightforward - just a matter of translating your virtual local-variable memory space into stack space.
If I understand your question correctly, then yes, you will have to use the SP/BP etc. here. That's what compiling to native machine code means: Translate the higher-level behaviour of your program into equivalent machine instructions that follow the conventions of the operating system it is running on.
So you would essentially have to do the same things you'd have to do to call the host-provided functions if you called them from assembler. That usually means sticking the values of the function arguments in the appropriate registers / push them on the stack, converting them as needed, then generating a CALL or JMP instruction or whatever the CPU expects to actually jump to the memory address of the given function.
You'd need to have a table of function name to function pointer mappings that the host provides to you, and look the address up from there.
Once the function returns, you would convert the values the function returned back to your internal types if needed and go on your merry way. (This is basically what all those "foreign function interface" libraries do internally).
Depending on your language and what it is used for, it might also be possible to cheat here. You could use your own internal pseudo-stack, and just add a special "call a native function" instruction. This instruction would receive information about the function as a parameter (e.g. what parameter types it takes/returns, how to look up the function pointer) and would then use a foreign function interface library to make the actual function call.
This would mean that calling a native function would have a slight overhead, but would mean you could keep your VM as-is, while still permitting people to call into native code to integrate with your application.

Advantages and drawbacks to implementing core methods of a scripting language in the underlying language

Background: I am writing a scripting language interpreter as a way to test out some experimental language ideas. I am to the point of writing the core set of standard methods (functions) for built-in types. Some of these methods need to directly interface with the underlying data structures and must be be written using the underlying language (Haskell in my case, but that is not important for this question). Others can be written in the scripting language itself if I choose.
Question: What are the advantages and drawbacks to implementing core library functions in either the underlying language or in the language itself?
Example: My language includes as a built-in type Arrays that work just like you think they do -- ordered data grouped together. An Array instance (this is an OO language) has methods inject, map and each. I have implemented inject in Haskell. I could write map and each in Haskell as well, or I could write them in my language using inject. For example:
def map(fn)
inject([]) do |acc,val|
acc << fn(val)
#end inject
#end def map
def each(fn)
inject(nil) do |acc,val|
fn val
#end inject
#end def each
I would like to know what the advantages and drawbacks are to each choice?
The main advantage is that you're eating your own dog food. You get to write more code in your language, and hence get a better idea of what it's like, at least for generic library code. This is not only a good opportunity to notice deficiencies in the language design, but also in the implementation. In particular, you'll find more bugs and you'll find out whether abstractions like these can be implemented efficiently or whether there is a fundamental barrier that forces one to write performance-sensitive code in another language.
This leads, of course, to one disadvantage: It may be slower, either in programmer time or run time performance. However, the first is valuable experience for you, the language designer, and the second should be incentive to optimize the implementation (assuming you care about performance) rather than working around the problem — it weakens your language and doesn't solve the same problem for other users who can't modify the implementation.
There are also advantages for future-proofing the implementation. The language should remain stable in the face of major modifications under the hood, so you'll have to rewrite less code when doing those. Conversely, the functions will be more like other user-defined functions: There is a real risk that, when defining some standard library function or type in the implementation language, subtle differences sneak in that make the type or function behave in a way that can't be emulated by the language.

Does handling recursion need special treatment when designing a compiler?

Function calls are handled via a stack data structure. Is this enough for supporting recursion?
Having the stack at all is the special treatment the compiler has to have when supporting recursion.
In older programming languages like early versions of FORTRAN, the runtime environment did not have a function stack and each function had exactly one activation record reserved for it somewhere in memory. That meant that recursion wasn't at all possible, because if you recursively called a function you would overwrite its one and only activation record, losing track of the context of how you arrived there.
The introduction of the function stack is what first enabled recursion to be actually expressed in a programming language. Prior to that, programmers would use recursion as a tool for abstractly solving a problem, but would then have to translate that code into iterative logic due to the lack of a call stack.
In order for a programming language to support recursion, it needs to have some mechanism for dynamically maintaining the call stack. This doesn't have to be through an explicit stack; you could in theory dynamically-allocate all of the stack frames and chain them together as a linked list. This can be useful, for example, if you want to support coroutines or closures and actually need to hold on to old activation records after the function returns so that the data can be stored later.
Hope this helps!