**Problem Detail:**

- The syntax of a language is a structure.
- The semantic of a language is a structure.
- The semantic of a language is a model of its syntax.

And that's all ? The duality syntax/semantic is just model theory applied to languages ?

(A short answer could be ok; I have already read the wikipedia pages !)

###### Asked By : François

#### Answered By : babou

I am afraid the phrasing of the question misled me (though I did know better) in first seeing model theory as applying to any two arbitrary mathematical structures, and being the study of homomorphisms of mathematical structures.

Actually this is wrong. Model Theory already contains the idea of syntax and semantics, more or less as I define it below. It studies *theories*, which are sets of sentences in a formal language, and *models* which are interpretation of these sentences in abstract mathematical structures. This is close to what I call *language* below, and I guess this is also close to what the author of the question calls *language*.

So the proper answer to the question is that even model theory has a concept of syntax and semantics, where syntax is a *theory*, i.e. a formal representation structure such as a universal algebra, and semantics is an interpretation (i.e. a morphism) in some abstract mathematical domain.

Hence **the "duality syntax/semantic" is already present in model theory**. There is nothing new on that side.

However, **what may make a difference is the understanding of the word language**, as it could be interpreted as going beyond what is usually concerned by the syntactic algebras usually addressed by model theory.

Actually, the question does not define the word language, and it could be formal languages, programming languages, natural languages, with some unsaid understanding of what is intended.

So two possible answers could be:

model theory is the study of languages with their syntax and semantics, and there is nothing to add, or

a language can be defined by an arbitrary syntactic definition (meaning a computable structure) and a homomorphic(?) mapping in some abstract mathematical structure. The question mark is because I fear that the definition of the mapping may raise some issues.

The second view is possibly less formal, or less constrained. Whatever the case, I am leaving below my initial answer (which corresponds to the second view), since several of the issues it is addressing may still be relevant and of interest to computer scientist, even though the presentation may be debatable from the point of view of model theory.

It does not bring anything new from a formal point of view, since the study of formal system that are considered in syntax already had to go all the way to Turing computability, in order to address the Entscheidungsproblem, and that is as far as we may hope to go in terms of syntax. I may however help understanding some more practical issues.

*Your comments are welcome, including suggestions to erase my former, more naive, though possibly more intuitive, answer.*

*The first section is the direct answer to the question. The other two sections may be skipped. They are attempts to explore some examples or consequences, or some practical aspects encountered by computer scientists, in particular in the design of programming languages.*

## Syntax, semantics and languages

The view of the issue as expressed in the question is mainly "OK". My main criticism though is that **the reference to the concept of language is critical**, because directly related to the concept of syntax, not separable from the concept of syntax.

As a consequence, there is something circular, kind of tautological, about your statement. The question is why do we use this specific terminology of "*syntax and semantics*" for some models, for some homomorphically related pairs of structures, and not for others. Where is the difference? And a correlated question is: **what is a language?**

Clearly there is no restriction on semantics: any topic/structure can be the object/content of a discourse. Hence, the fundamental distinction is to be looked for on the syntax side. My own understanding is that what distinguishes such a homomorphic pair is the domain of the homomorphism: **the syntax structure has to be concrete and concretely manipulatable**. Basically a syntactic structure is a finite collection of concrete objects, usually idealized as symbols, and a finitely defined collection of rules to compose in diverse ways any unboundedly finite number of copies of these symbols, and to transform such compositions into others compositions. The word *syntax* comes from 'Ancient Greek: σύνταξις "coordination" from σύν *syn*, "together," and τάξις *táxis*, "an ordering"'.

Fundamentally, syntax is physical. The concept of symbol is only a mean to abstract away the choice of physical instanciation of the symbols as objects such as sounds, gestures, glyphs, objects, stones (which gave us the word *calculus*), electric or magnetic state, presence and absence of a signal, etc., or composition of other symbols.

Hence, the theory of syntax is essentially what is better known as the theory of computation, which is the study of rule based manipulation of symbols. Whatever we express, transmit, receive, compute, memorize, analyze, is processed through some syntactic symbolic representation.

But what we are interested in is not the symbolic representations, but some semantic concepts or objects (possibly themselves physical) that are intended to be represented by the symbolic compositions at hand. And the correspondence is defined by the semantic homomorphism into its semantic codomain.

Then, in the most general acceptation of the word, **a language is the association of a syntax structure and a semantic structure through a homomorphic mapping**. Or to state it in the terms of the question, it is a pair of structures such that one is a model of the other, with the additional requirement that the latter be a syntactic structure. **But, in order to give such a definition, you must first define what is a syntactic structure.** And that definition stands on its own, independently of the concepts of *language* or *semantics*: **Syntax is the study of finitely defined manipulations (including transmission and memorization) of unboundedly finite compositions of copies of a finite set of symbols**. Ot in simpler terms, **syntax concerns the definition, analysis and manipulation of representations**. You may read *computation theory* and *information theory*.

This is in agreement with the concept of homomorphism: the meaning of the whole is derived from the meaning of the parts. But this is a meta-remark. I insist less on defining semantics since it seems that any structure (that makes sense?) can play that role.

## Further remarks

I do not know if the view of linguists on natural language is exactly the same, or as precise regarding the homomorphic relation, but it should not be too far from it. The concepts of syntax and semantics arose from language analysis and logic, which I assume were the same topic at some point in history (actually diverging pretty late, in the 19th century according to Wikipedia page on syntax). Still, it seems that some modern linguists follow somewhat similar views "*since they regard syntax to be the study of an abstract formal system*".

The point is that most mathematical domains, abstract concepts, and other objects of discourse generally elude us, and can only be contemplated and dealt with through mechanisms to represent them and (some of) theirs values, i.e. through syntactic representation. In its simplest form, it can be a pointing finger to represent a direction.

The main characteristic of a syntactic domain is that it is a concrete representation, finitely defined through some formal system such as the formal languages dear to computer scientists. But these syntactic domains, which are very precisely the domain of computation, i.e. of syntactic transformations, are very limited, as we learn from the study of computability. For example, they are denumerable, while many abstract domains we are interested in are not denumerable.

As a consequence, it is often the case that syntax can only represent a small part of the mathematical domains we study. This is typically the case for real numbers, almost all of which do not have a (finite) syntactic representation, the others being the computable reals.

Choosing a data structure to represent the values of some abstract domain we wish to compute with is just finding an appropriate syntax, for your semantic problem, whether to express some conceptual semantic derivation (computation) or to have simpler relation with semantics (semantic morphism), or to optimize some aspect of communication and memorization (space. speed, reliability).

For users who want an example, computing the sum of XXVII and XCII, I mean 11011 and 1011100, is just a manipulation of the digits to get the representation of a number (119) that is the sum of the numbers represented by the initial two representations. The positional notation happens to be easier to manipulate, to be more economical in space (number and variety of symbols), and to have a simpler way to express the homomorphism. The syntax changes, the semantics is the same, and in both correspond to languages to express natural numbers (well, not quite for Roman numerals).

IMHO (but some of this may be a matter of individual perspective), if syntax should be only what can be represented and manipulated, semantics can be any mathematical domain, or more generally any abstract domain of discourse, including the domains used for syntax. Hence you can have a theory of syntax where you study syntactic representation systems. Formal language theory is nothing else.

Note that the the idea of a syntac-semantic *duality* should not induce the idea that there is some sort of symmetry between syntax and semantics.

## Syntax and semantics in computer science practice

It is true that we often pretend to address semantic issues, for example when defining formally the semantics of a programming language $L$. But that may be seen as an abuse. What we really do is use a standard language $S$ that is supposed to have an accepted semantics and define the semantics of $L$ in terms of the semantics of $S$. The difficulty is that we can never access semantics directly, but only through some syntactic proxy, even when the semantics is itself syntactic in nature. This is why we use metalanguages such as Backus-Naur Form (BNF).

There are however various mathematical techniques to define semantic domains. Among these techniques, you have the definition of syntactic models where the syntax (possibly up to quotient by some equivalence relation) is its own model. Actually, one of the major contribution of Dana Scott to computer science was to conceive non-syntactic models of lambda-calculus which is (initially at least) a purely syntactic theory (see for example Continuation Models for the Lambda Calculus With Constructors, B. Petit (2012).

A characteristic of most mathematical theories is that they may have many models, including non-standard models. "*A non-standard model is a model of a theory that is not isomorphic to the intended model (or standard model)*". That also applies to syntactic theories. Actually, non-standard interpretation of languages is a very fruitful area of research in computer science, related to abstract interpretation, type analysis, flow analysis, partial evaluation and many other approaches to program analysis. To take a very trivial, purely syntactic example, a non-standard interpretation of a program can be a pretty-printed version of that program. A type checker is usually an abstract interpretation of the program.

As a consequence, it is clear that the chosen syntax may not capture, express, all the properties of the intended semantic domain. However, anything that is considered as syntactically correct must have some associated semantics through the semantic morphism, even if it is some special error value. But it is improper, though often done, to state that the syntax is defined by a context-free grammars, and then consider programs with undeclared variables as being incorrect. It cannot be syntactically incorrect for such a reason, because CF grammars cannot check variable declarations. And I would not know what it means to be semantically incorrect, other than being possibly a special error value of the semantic structure.

The dream of programming language designers or implementor is that any program that can only fail, or that can fail in some specific way, should be syntactically incorrect. Hence, they have been increasing the descriptive power of syntax to include as much checking as possible, limited only by computability constraints. This did lead to the use of Turing complete formalisms, of which van Wijngaarden's W-grammars are an example. But this is a pipe dream. Computability theory tells us that semantic properties of syntactically expressed functions cannot be checked syntactically, as soon as the language has Turing power, which we usually expected in programming languages.

In practice, this checking for intrinsic correctness (i.e. independent of problem specification), or for properties to be used, is often performed in compilers (or in formal semantic definitions) by what is called static semantics. Being purely computational, static semantics can be as well considered as part of the syntax.

In the case of programming languages, the semantic domain is a computation domain. Given that we allow Turing complete formalisms to express the syntax, it becomes possible for the syntax to express completely the semantics. But developing this angle is getting far away from the original question. Still, this is the main reason why the distinction between syntax and semantics is so difficult to make precise in programming languages. This may also be the reason why universal Turing machines are possible.

A last point is that defining the semantics as a structure that is a model of the syntax may not be enough. It could be a model in several different ways. So it is essential to include the morphism in the description of the semantics. That is typically what is done with denotational semantics of programming languages, and a lot of other formal systems.

_{ Note: This was rather complex and hard to write. If you are to criticize it, I would appreciate explicit and precise comments. It would also be useful to other users. Thank you. }

###### Best Answer from StackOverflow

Question Source : http://cs.stackexchange.com/questions/45291

## 0 comments:

## Post a Comment

Let us know your responses and feedback