## Determining number of states in a DFA - definition

For Σ = {a,b,c,d,e,...,z}, consider the set L of words w such that the last symbol of w has not appeared before. For example, the words apple, google, k, and ε are in L, but the words potato, and nutrition are not in L. Suppose we want to construct a DFA for this language. How many states will it have (minimally)? Describe the DFA succinctly: do not attempt to draw it, but explain its formal definition, (e.g. states and transitions) using a suitable mathematical notation.
I don't need the whole definition, just a start as to how many states it will have and why. From there I'm comfident I can figure it out.

Some hints:
You're going to need one state for each possible subset of Σ so that you can keep track of which characters you've seen so far.
For each subset of Σ, you'll need another state to represent "this set of characters, where the last character read happened to be in the set."
I'll leave the rest of the details to you and you'll have to think about how to formally prove this is correct.
Hope this helps!

## Related

### Is the language of all strings over the alphabet “a,b,c” with the same number of substrings “ab” & “ba” regular?

Is the language of all strings over the alphabet "a,b,c" with the same number of substrings "ab" & "ba" regular? I believe the answer is NO, but it is hard to make a formal demonstration of it, even a NON formal demonstration. Any ideas on how to approach this?

It's clearly not regular. How is an FA going to recognize (abc)^n c (cba)^n. Strings like this are in your language, right? The argument is a simple one based on the fact that there are infinitely many equivalence classes under the indistinguishability relation I_l.

The most common way to prove a language is NOT regular is using on of the Pumping Lemmas. Using the lemma is a little tricky, since it has all those "exists" and so on. To prove a language L is not regular using the pumping lemma you have to prove that for any integer p, there is a word w in L of length n, with n>=p, such that for all possible ways to decompose w as xyz, with len(xy) <= p and y non empty there exists an i such that x(y^i)z (repeating the y bit i times) is NOT in L whooo! I'l l show how the proof looks for the "same number of as and bs" language. It should be straighfoward to convert to your case: for any given p, we can make a word of length n = 2*p a^p b^p (p a's followed by p b's) any way you decompose this into xyz w/ |xy| <=p, y will only contain a's. Thus, pumping the the y part will make the word have more as than bs, thus NOT belonging to L. If you need intuition on why this works, it follows from how you need to be able to count to arbritrarily large numbers to verify if a word belongs to one of these languages. However, Regular Languages are described by finite automata and no finite automata can represent the infinite ammount of states required to represent all the numbers. (The Wikipedia article should have a formal proof). EDIT: It looks like you can't straight up use the pumping lemma in this particular case directly: if you always make y be one character long you can never make a word stop being accepted (aba becoming abbbba makes no difference and so on). Just do the equivalence class approach suggested by Patrick87 - it will probably turn out to be cleaner than any of the dirty hacks you would need to do to make the pumping lemma applicable here.

### How to show that if the language L is regular, then L' is regular?

Let L be any regular language and a ∈ Σ. How to show that the language L'={uav | uv ∈ L} is regular too? Wikipedia says a way to proove it is to lead it back to a regular language but I don't understand how to do that in this case. Hope somebody can help.

There are lots of ways to show this. I think an argument whereby we construct a DFA is particularly easy to visualize. Imagine a DFA for your language L. Let's call it M. Imagine it sprawled out in diagram form on a table. Now, imagine making a copy of M and spreading it out next to M on the table. Call it M'. Now - from M, add a new transition from state q of M to the corresponding state q' of M'. The transition is on the symbol a. Now, consider the aggregate machine whose start state is the start state of M and whose accepting states are the accepting states of M'. This machine starts out accepting strings in L, then accepts an a somewhere in the middle, and then continues accepting strings in L from where it left off. This is the language we were going for and we have defined a perfectly reasonable NFA for it. Since any language accepted by an NFA is regular, our language is regular.

### Will L = {a*b*} be classified as a regular language?

Will L = {a*b*} be classified as a regular language? I am confused because I know that L = {a^n b^n} is not regular. What difference does the kleene star make?

Well it is makes difference when you have a L = {a^n b^n} and a L = {a*b*}. When you have a a^n b^n language it is a language where you must have the same number of a's and b's example:{aaabbb, ab, aabb, etc}. As you said this is not a regular expression. But when we talk about L = {a*b*} it is a bit different here you can have any number of a followed by any numbers of b (including 0). Some example are: {a, b, aaab, aabbb, aabbbb, etc} As you can see it is different from the {a^n b^n} language where you needed to have the same numbers of a's and b's. And yes a*b* is regular by its nature. If you want a good explanation why it is regular you can check this How to prove a language is regular they might have a better explanation then me (: I hope it helped you

The language described by the regular expression ab is regular by definition. These expressions cannot describe any non-regular language and are indeed one of the ways of defining the regular languages. {a^n b^n: n>0} (this would be a formally complete way of describing it) on the other hand, cannot be described by a regular expression. Intuitively, when reaching the border between a and b you need to remember n. Since it is not bounded, no finite-memory device can do that. In ab you only need to remember that from now on only b should appear; this is very finite. The two stars in some sense are not related; each expands its block independently of the other.

### Regular language?

I have a compiler question. Determine whether {(ab)^n | n >= 0} is a regular language? But I can draw its NFA. But if I use pumping lemma, I will get a contradiction answer. Can anyone help me ?

I understand that this thread is old, but just in case this could help another student in the same situation, here is some discussion. This language is regular, and you cannot show it to be non-regular using the pumping lemma. To see that it's regular, it suffices to produce a regular expression to generate it or an NFA to recognize it. The regular expression is trivial: (ab)*. An NFA is easy: two states; initial state accepting, other not; transition from initial to other on a; from other to initial on b. Done. Let's see why the pumping lemma can't be used on this. To use the pumping lemma, you need to pick a candidate substring to pump. For this language, no matter how big you make the string, you will always find the following substring in a range of symbols of length at least 2: ab. Since this could always be the substring that constitutes the loop the pumping lemma says exists, there's no way to rule out that you have a regular language with (ab)* somewhere inside it, using the pumping lemma alone. (Note: for sufficiently long strings, you can't rule out the substring ba, either). Since you don't get to pick the substring that gets pumped (there are restrictions on where it can be taken, but those are put of the lemma, not something you decide), if any of the substrings work, you lose and the pumping lemma fails to demonstrate anything. To show e.g. that L = {a^k b^k | k >= 0} is not regular using the pumping lemma, you need to pick a string for which it doesn't matter which substring you take, so long as it satisfies the hypotheses of the PL. This is why, for instance, taking a^n b^n works (all substrings satisfying the hypotheses of the PL are of the form a+, and pumping that will change the number of a without changing the number of b).

### Using Closure Properties to prove Regularity

Here's a homework problem: Is L_4 Regular? Let L_4 = L*, where L={0^i1^i | i>=1}. I know L is non-regular and I know that Kleene Star is a closed operation, so my assumption is that L_4 is non-regular. However my professor provided an example of the above in which L = {0^p | p is prime}, which he said was regular by proving that L* was equal to L(000* + e) by saying each was a subset of one another (e in this case means the empty word). So his method involved forming a regex of 0^p, but how I can do that when I essentially have one already?

Regular languages are closed under Kleene star. That is, if language R is regular, so is R*. But the reasoning doesn't work in the other direction: there are nonregular languages P for which P* is actually regular. You mentioned one such P in your question: the set of strings 0^p where p is prime. It is easy to use the pumping lemmas for regular and context-free languages to show that P is at least context-sensitive. However, P* is equivalent to the language 0^q, where q is the sum of zero or more primes. But this is true for q=0 (the empty string) and any q>=2, so P* can be recognized with a 3-state DFA, even though P itself is not regular. So L being context-free has no bearing on whether your L_4 = L* is regular or not. If you can construct a DFA that recognizes L_4, as I did for P* above, then clearly it's regular. In the process of trying to find a DFA that works, you'll probably see some pattern emerge that can be used as the basis for a pumping argument. The Myhill-Nerode theorem is another approach to proving a language non-regular, and is useful if the language lends itself to analysis of prefixes and distinguishing extensions. If the language can be decomposed into a finite set of equivalence classes under a certain relation, then it can be recognized with a DFA containing that many states.