1 Auto-graded problems
These problems are
not randomized
, so there is no need to first submit a file named
req
. Each
problem below appears as a separate “Assignment” in Gradescope, beginning with “HW1:”.
1.1 DFAs
For each problem submit to Gradescope a
.dfa
file describing a DFA deciding the given language.
Make sure that it is a
plain text file
that ends in
.dfa
(
not
.txt
).
Use the finite automata simulator to test the DFAs:
http://web.cs.ucdavis.edu/~doty/
automata/
. Documentation is available at the help link at the top of that web page.
Do not just submit to Gradescope without testing on the simulator.
The purpose
of this homework is to develop intuition. Gradescope will tell you when your DFA gets an answer
wrong, but it will not tell you
why
it was wrong. You’ll develop more intuition by running the
DFA in the simulator, trying to come up with some of your own examples and seeing where they
fail, than you will by just using the Gradescope autograder as a black box. Once you think your
solution works, submit to Gradescope. If you fail any test cases,
go back to the simulator
and use
it to see
why
those cases fail. During an exam, there’s no autograder to help you figure out if your
answer is correct. Practice right now how to determine for yourself whether it is correct.
Gradescope may give strange errors if your file is not formatted properly. If your file is not
formatted properly, the simulator will tell you this with more user-friendly errors. Also, if you lose
points on a Gradescope test case, try that test case in the simulator to ensure that your DFA is
behaving as you expect.
begin and end:
{
w
∈ {
0
,
1
}
∗
|
w
begins with 010 and ends with a 0
}
at most three 1s:
{
w
∈ {
0
,
1
}
∗
|
w
contains at most three 1’s
}
.
no substring:
{
w
∈ {
a
,
b
,
c
}
∗
|
w
does not contain the substring
acab
}
.
even odd:
{
w
∈ {
a
,
b
}
∗
|
w
starts with
a
and has even length, or
w
starts with
b
and has odd
length
}
.
mod:
{
w
∈ {
0
,
1
}
∗
|
w
is the binary expansion of
n
∈
N
and
n
≡
3 mod 5
}
. Assume
ε
represents
0 and that leading 0’s are allowed. A number
n
∈
N
is
congruent
to 3 mod 5 (written
n
≡
3
mod 5) if
n
is 3 greater than a multiple of 5, i.e.,
n
= 5
k
+ 3 for some
k
∈
N
. For instance,
3, 8, and 13 are congruent to 3 mod 5.
1.2 Regular expressions
For each problem submit to Gradescope a
.regex
file with a regular expression deciding the given
language. Use the regular expression evaluator to test each regex:
http://web.cs.ucdavis.
edu/~doty/automata/
. Do
not
test them using the regular expression library of a programming
language; typically these are more powerful and have many more features that are not available in
the mathematical definition of regular expressions from the textbook. Only the special symbols
(
) * + |
are allowed, as well as “input alphabet” symbols: alphanumeric, and
.
and
@
.
Note on subexpressions:
You may want to use the ability of the regex simulator to define
subexpressions that can be used in the main regex. (See example that loads when you click “Load
Default”). But it is crucial to use variable names for the subexpressions that are not themselves
symbols in the input alphabet; e.g., if you write something like
A = (A|B|C);
, then later when
you write
A
, it’s not clear whether it refers to the symbol
A
or the subexpression
(A|B|C)
. Instead
try something like
alphabet = (A|B|C);
and use
alphabet
in subsequent expressions, or
X =
(A|B|C);
if
X
is not in the input alphabet.
Note on nested stars:
Regex algorithms can take a long time to run when the number of
nested stars
is large. The number of nested stars is the maximum number of
∗
’s (or
+
’s) that appear
on any root-to-leaf path in the parse tree of the regex.
a
∗
b
∗
has one nested star, (
a
∗
)
∗
b
∗
has two
nested stars, and ((
a
∗
)
∗
b
∗
)
+
has three nested stars. Note that some of these are unnecessary; for
instance (
a
∗
)
∗
b
∗
is equivalent to
a
∗
b
∗
None of the problems below require more than two nested
stars; if you have a regex with more, see if it can be simplified by removing redundant stars such
as (
a
∗
)
∗
.
first appears more:
{
x
∈ {
0
,
1
}
∗
| |
x
| ≥
3 and the first symbol of
x
appears at least three times total in
x
}
repeat near end:
{
x
∈ {
0
,
1
}
∗
|
x
[
|
x
| −
5] =
x
[
|
x
| −
3]
}
Assume we start indexing at 1, so that
x
[
|
x
|
] is the last symbol in
x
, and
x
[1] is the first.
email:
{
x
∈
Σ
∗
|
x
is a syntactically valid email address
}
Definition of “syntactically valid email address”:
Let Σ =
{
.
, @,
a
,
b
}
contain the
alphabetic symbols
a
and
b
,
1
as well as the symbols for period
.
and “at” @. Syntactically
valid emails are of the form
username
@
host
.
domain
where
username
and
host
are nonempty
and may contain alphabetic symbols or
.
, but never two
.
’s in a row, nor can either of them
begin or end with a
.
, and
domain
must be of length 2 or 3 and contain only alphabetic
symbols. For example,
aaba@aaabb.aba
and
ab.ba@ab.abb.ba
are valid email addresses,
but
aaabb.aba
is not (no
@
symbol), nor is
.ba@ab.abb.ba
(
username
starts with a
.
), nor is
1
It’s not that hard to make a regex that actually uses the full alphanumeric alphabet here, but historically we’ve
found that many students’ solutions are correct but use so many subexpressions that they crash the simulator. Using
only two alphabetic symbols
a
and
b
reduces this problem, even though it makes the examples more artificial-looking.