一、关于pure和impure
Haskell作为一门纯函数式语言,副作用剥离是它的一大特色,基本上较为严格的遵循数学函数的形式。但是也带来了一些问题,譬如在实现一个文本编辑器的时候,需要把token记录下来,这个问题在C语言中很容易解决,搞个静态的变量即可,但是在Haskell中就没有那么容易了。可能需要这么做:
getToken >>= \s -> return (s:tokenList) >>= showInfo
getToken返回一个IO Token,通过Monad运算符(>>=)传递到一个lambda表达式,把IO Token中的Token提取出来,然后添加到tokenList后再给后一级函数。
二、Why important of purity?
purity并不是必须的。譬如实现一个编译器,必须要处理命令行输入,就不可能是完全pure的。事实上也不是所有的函数式语言都是pure的,譬如Lisp就是impure的。
三、一个tokenizer的haskell实现
这个tokenizer很简单,只是高亮了C语言的保留字。
分成了三个模块:
- 主程序(parser)
- LexInfo
- DebugInfo
3.1 主parser
因为是文本界面,渲染选择了ansi-terminal这个lib,里面有提供cursor操作、清屏、着色等接口。每次通过getToken函数抓取一个token,传给parser,由parser处理后保存或者做其他处理。
{-# LANGUAGE ForeignFunctionInterface #-}
module Main
(
main
) where
import System.Console.ANSI
import System.Process
import System.IO
import System.Exit
import Control.Monad
import Data.Char(ord)
import LexInfo
import DebugInfo
resetScreen = do
clearScreen
cursorUpLine 1000
return ()
isWordChar :: Char -> Bool
isWordChar ch = ch `elem` wordCharList
where wordCharList = '_':['A'..'Z']++['a'..'z']
{- 'A' == UpArrow; 'B' == DownArrow;
'C' == RightArrow; 'D' == LeftArrow-}
dealArrow :: Char -> IO()
dealArrow ch
| ch == 'A' = cursorBackward 4 >> clearFromCursorToLineEnd >> cursorUpLine 1
| ch == 'B' = cursorBackward 4 >> clearFromCursorToLineEnd >> cursorDownLine 1
| ch == 'C' = cursorBackward 4 >> clearFromCursorToLineEnd >> cursorForward 1
| ch == 'D' = cursorBackward 4 >> clearFromCursorToLineEnd >> cursorBackward 1
| otherwise = return ()
cmdParser :: Char -> IO String
cmdParser ch
| ch == 'q' = cursorBackward 1 >> clearFromCursorToLineEnd >> return "quit"
| ch == 's' = cursorBackward 1 >> clearFromCursorToLineEnd >> return "savefile"
| otherwise = cursorBackward 1 >> return "undefined"
getWord :: String -> IO String
getWord str = getChar >>= go str where
go str ch = if ch == '\DEL'
then do
cursorBackward 3
clearFromCursorToLineEnd
getWord (drop 1 str)
else if ch == '\ESC' -- ^[[A == UpArrow
then do
getChar >> getChar >>= dealArrow
return $reverse str
else if ch == ':'
then
cursorBackward 1 >> clearFromCursorToLineEnd >> getChar >>= cmdParser
else if not (isWordChar ch)
then do
return $reverse (str)
else
getWord $ch:str
{- type 0 : normal word
type 1 : keyword-}
parse :: String -> IO Token
parse str
| str `elem` keywords = do
setSGR [SetColor Foreground Vivid Red] >> cursorBackward (length str + 1)
putStr str >> putChar ' '
setSGR [Reset]
return (Token str 1)
| str == "quit" = return (Token "" (-1))
| str `elem` whiteSpace = return Empty
| otherwise = return (Token str 0)
where keywords = getKeyRsv 'c'
whiteSpace = [" ", "\n", "\t"]
saveToken :: Handle -> Token -> IO (Token)
saveToken tmph token = if tokenType token == -1
then
hClose tmph >> exitSuccess
else do
hPutStr tmph $tokenValue token
return token
getToken :: IO Token
getToken = getWord "" >>= parse
main :: IO ()
main = do
hSetBuffering stdin NoBuffering
hSetBuffering stdout NoBuffering
resetScreen
forever $do
let tokenList = [] in
mainloop tokenList
where mainloop tokenList = getToken >>= \s -> return (s:tokenList) >>= showInfo False >>= mainloop
3.2 LexInfo
LexInfo的话目前只是枚举了C语言的保留字,通过导出getKeyRsv函数,可以方便的添加其他语言对应的关键字。
module LexInfo
(
Token(..),
getKeyRsv
) where
data Token = Token {
tokenValue :: String,
tokenType :: Int
} | Empty deriving (Show)
getKeyRsv :: Char -> [String]
getKeyRsv ch
| ch == 'c' =
["auto", "break", "case", "char", "const", "continue", "default", "do", "double", "else", "enum", "extern", "float", "for", "goto", "if", "int", "long", "register", "return", "short", "signed", "sizeof", "static", "struct", "switch", "typedef", "union", "unsigned", "void", "volatile", "while"]
| otherwise = [""]
3.3 DebugInfo
供调试用,提供showInfo接口,用来插入Monad链中辅助调试。
module DebugInfo
(
showInfo
) where
import LexInfo
showInfo :: Bool -> [Token] -> IO [Token]
showInfo flag tokenList = case flag of
True -> do
print tokenList >> return tokenList
otherwise -> return tokenList
欢迎关注微信公众号