最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

[已解决]swift去除nbsp的乱码

iOS crifan 3792浏览 0评论

app中收发消息,有些有@名字后面的nbsp,是来自于web端产生的:

<a ng-click="selectMessageType(0);openChat(‘user-93dcd6b9-ff71-4cb3-8bf9-3f516f0de1a8’)" style="color:#1878c7;font-weight:bold;" class="ng-scope">@Anglia</a>

<span class="ng-scope">‍&nbsp;只是测试at功能</span>

此处iOS的app内收到的是:

"text" : "@Anglia 只是测试at功能",

用TextView显示,对于nbsp则是乱码:

此处的目的是:

对于收到的文本String,检测其中如何包含

nbsp的Unicode字符串

则就直接替换为空格

swift unicode nbsp

json – How do I decode HTML entities in swift? – Stack Overflow

此处,得知:

"&nbsp;"    : "\u{00a0}"

Unicode Character ‘NO-BREAK SPACE’ (U+00A0)

swift decode html entities

How do I encode HTML entities in swift 2.0? – Stack Overflow

GitHub – TheFlow95/DecodeHTML.swift: A String extension to decode HTML entities to string

结果此处代码:

extension String {

    var decodedHtml:String {

        var decodedHtmlStr = self

        print("decodedHtmlStr=\(decodedHtmlStr)")

        do {

            if let encodedData = decodedHtmlStr.dataUsingEncoding(NSUTF8StringEncoding) {

                let attributedOptions : [String: AnyObject] = [

                    NSDocumentTypeDocumentAttribute        : NSHTMLTextDocumentType,

                    NSCharacterEncodingDocumentAttribute    : NSUTF8StringEncoding

                ]

                print("attributedOptions=\(attributedOptions)") //["DocumentType": NSHTML, "CharacterEncoding": 4]

                let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)

                print("attributedString=\(attributedString)")

                /*

                attributedString=@Anglia again测试{

                    NSColor = "UIDeviceRGBColorSpace 0 0 0 1";

                    NSFont = "<UICTFont: 0x7e01b0a0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";

                    NSKern = 0;

                    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 15/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (\n), Lists (\n), BaseWritingDirection 0, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0";

                    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";

                    NSStrokeWidth = 0;

                }

                */

                decodedHtmlStr = attributedString.string

                print("decodedHtmlStr=\(attributedString)")

            }

        } catch {

            print("decodedHtml error: \(error)")

        }

       

        return decodedHtmlStr

    }

}

结果还是乱码:

然后才发现:

此处iOS端得到的不是&nbsp;

所以此处就需要:

想办法得知字符串中某个字符的unicode编码

以便于搞清楚到底是输入的是什么字符

搜:

swift get char unicode value

swift – How can I get the Unicode code point(s) of a Character? – Stack Overflow

此处通过:

unicodeScalars

(lldb) po newTextMsg.text.unicodeScalars

▿ StringUnicodeScalarView("@Anglia again测试")

  – [0] : "@"

  – [1] : "A"

  – [2] : "n"

  – [3] : "g"

  – [4] : "l"

  – [5] : "i"

  – [6] : "a"

  – [7] : "\u{200D}"

  – [8] : "\u{00A0}"

  – [9] : "a"

  – [10] : "g"

  – [11] : "a"

  – [12] : "i"

  – [13] : "n"

  – [14] : "\u{6D4B}"

  – [15] : "\u{8BD5}"

而得知对应的值是:

\u{200D}

\u{00A0}

然后再去找

unicode table

– 零宽连接符 (U+200D) 符号,文字,符号,图标, html: ‍ &zwj; – 常用标点 – Unicode®字符百科

– 无中断空格 (U+00A0) 符号,文字,符号,图标, html:   &nbsp; – 拉丁文补充1 – Unicode®字符百科

ios unicode not support 200d

iphone – Does IOS support all Unicode emojies? – Stack Overflow

Ios7 displays unicode wrong | Apple Support Communities

ios not support some unicode char

ios UITextView unicode

ios UITextView unicode  00a0

ios – UITextView control line breaks – Stack Overflow

Detect Unicode characters in NSString on iPhone – Stack Overflow

Ios: UITextView控制換行 – 數碼維基

ios7 – UITextField Right Alignment iOS 7 – Stack Overflow

搜:

如何在swift中用代码表示unicode字符串

swift unicode char

The Swift Programming Language (Swift 2.2): Strings and Characters

 所以,最终的代码为:

[总结]

extension String {

    var decodedHtml:String {

        var decodedHtmlStr = self

        print("decodedHtmlStr=\(decodedHtmlStr)")

        do {

            if let encodedData = decodedHtmlStr.dataUsingEncoding(NSUTF8StringEncoding) {

                let attributedOptions : [String: AnyObject] = [

                    NSDocumentTypeDocumentAttribute        : NSHTMLTextDocumentType,

                    NSCharacterEncodingDocumentAttribute    : NSUTF8StringEncoding

                ]

                print("attributedOptions=\(attributedOptions)") //["DocumentType": NSHTML, "CharacterEncoding": 4]

                let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)

                print("attributedString=\(attributedString)")

                /*

                attributedString=@Anglia again测试{

                    NSColor = "UIDeviceRGBColorSpace 0 0 0 1";

                    NSFont = "<UICTFont: 0x7e01b0a0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";

                    NSKern = 0;

                    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 15/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (\n), Lists (\n), BaseWritingDirection 0, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0";

                    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";

                    NSStrokeWidth = 0;

                }

                */

                decodedHtmlStr = attributedString.string

                print("decodedHtmlStr=\(attributedString)")

            }

        } catch {

            print("decodedHtml error: \(error)")

        }

       

        return decodedHtmlStr

    }

}

            iflet text = newMessageDict["text"].string {

                //let decodedHtmlStr = text.decodedHtml

                //newTextMsg.text = decodedHtmlStr

                newTextMsg.text = filterUnsupportChar(text)

            }

func filterUnsupportChar(originStr:String) ->String {

    var filtedStr = originStr

   

    //iOS UITextView not support 200D, so remove it here

    filtedStr = filtedStr.replace("\u{200d}", to: "")

    print("removed 200d: filtedStr=\(filtedStr)")

    //seems iOS support 00A0, so no need to replace it here

//    filtedStr = filtedStr.replace("\u{00a0}", to: " ")

//    print("replaced 00a0: filtedStr=\(filtedStr)")

   

    return filtedStr

}

输出为:

(lldb) po filtedStr.unicodeScalars

▿ StringUnicodeScalarView("@Anglia test")

  – [0] : "@"

  – [1] : "A"

  – [2] : "n"

  – [3] : "g"

  – [4] : "l"

  – [5] : "i"

  – [6] : "a"

  – [7] : "\u{200D}"

  – [8] : "\u{00A0}"

  – [9] : "t"

  – [10] : "e"

  – [11] : "s"

  – [12] : "t"

(lldb) po filtedStr

"@Anglia test"

removed 200d: filtedStr=@Anglia test

replaced 00a0: filtedStr=@Anglia test

[后记 20160422]

后期又出现:

&zwj;&nbsp

好像对于上述版本还是没有解码。

但是更严重的问题是:

必须要在主线程中运行此函数,所以还是很麻烦

详见:

[已解决]iOS 8.1中出错:NSAttributedStrin初始化出错:EXC_BAD_ACCESS code=1 address 0xbbadbeef

swift html entity decode

json – How do I decode HTML entities in swift? – Stack Overflow

-》Decoding HTML Entities in Swift

后来看到:

Decoding HTML Entities in Swift

以及:

StringExtensionHTML on CocoaPods.org

-》adela-chang/StringExtensionHTML

-》StringExtensionHTML/StringExtensionHTML.swift at master · adela-chang/StringExtensionHTML

直接利用后者即可。

期间遇到:

[已解决]swift中代码出错:Value of type String has no member extend

[已解决]swift代码出错:distance is unavailable call the distanceTo(end) method on the index

最后用代码:

// Mapping from XML/HTML character entity reference to character

// From http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

let HtmlCharacterEntitiesDict : [String: Character] = [

    // XML predefined entities:

    "&quot;"    : "\"",

    "&amp;"      : "&",

    "&apos;"    : "’",

    "&lt;"      : "<",

    "&gt;"      : ">",

   

    // HTML character entity references:

    "&nbsp;"    : "\u{00A0}",

    "&iexcl;"    : "\u{00A1}",

    "&cent;"    : "\u{00A2}",

    "&pound;"    : "\u{00A3}",

    "&curren;"  : "\u{00A4}",

    "&yen;"      : "\u{00A5}",

    "&brvbar;"  : "\u{00A6}",

    "&sect;"    : "\u{00A7}",

    "&uml;"      : "\u{00A8}",

    "&copy;"    : "\u{00A9}",

    "&ordf;"    : "\u{00AA}",

    "&laquo;"    : "\u{00AB}",

    "&not;"      : "\u{00AC}",

    "&shy;"      : "\u{00AD}",

    "&reg;"      : "\u{00AE}",

    "&macr;"    : "\u{00AF}",

    "&deg;"      : "\u{00B0}",

    "&plusmn;"  : "\u{00B1}",

    "&sup2;"    : "\u{00B2}",

    "&sup3;"    : "\u{00B3}",

    "&acute;"    : "\u{00B4}",

    "&micro;"    : "\u{00B5}",

    "&para;"    : "\u{00B6}",

    "&middot;"  : "\u{00B7}",

    "&cedil;"    : "\u{00B8}",

    "&sup1;"    : "\u{00B9}",

    "&ordm;"    : "\u{00BA}",

    "&raquo;"    : "\u{00BB}",

    "&frac14;"  : "\u{00BC}",

    "&frac12;"  : "\u{00BD}",

    "&frac34;"  : "\u{00BE}",

    "&iquest;"  : "\u{00BF}",

    "&Agrave;"  : "\u{00C0}",

    "&Aacute;"  : "\u{00C1}",

    "&Acirc;"    : "\u{00C2}",

    "&Atilde;"  : "\u{00C3}",

    "&Auml;"    : "\u{00C4}",

    "&Aring;"    : "\u{00C5}",

    "&AElig;"    : "\u{00C6}",

    "&Ccedil;"  : "\u{00C7}",

    "&Egrave;"  : "\u{00C8}",

    "&Eacute;"  : "\u{00C9}",

    "&Ecirc;"    : "\u{00CA}",

    "&Euml;"    : "\u{00CB}",

    "&Igrave;"  : "\u{00CC}",

    "&Iacute;"  : "\u{00CD}",

    "&Icirc;"    : "\u{00CE}",

    "&Iuml;"    : "\u{00CF}",

    "&ETH;"      : "\u{00D0}",

    "&Ntilde;"  : "\u{00D1}",

    "&Ograve;"  : "\u{00D2}",

    "&Oacute;"  : "\u{00D3}",

    "&Ocirc;"    : "\u{00D4}",

    "&Otilde;"  : "\u{00D5}",

    "&Ouml;"    : "\u{00D6}",

    "&times;"    : "\u{00D7}",

    "&Oslash;"  : "\u{00D8}",

    "&Ugrave;"  : "\u{00D9}",

    "&Uacute;"  : "\u{00DA}",

    "&Ucirc;"    : "\u{00DB}",

    "&Uuml;"    : "\u{00DC}",

    "&Yacute;"  : "\u{00DD}",

    "&THORN;"    : "\u{00DE}",

    "&szlig;"    : "\u{00DF}",

    "&agrave;"  : "\u{00E0}",

    "&aacute;"  : "\u{00E1}",

    "&acirc;"    : "\u{00E2}",

    "&atilde;"  : "\u{00E3}",

    "&auml;"    : "\u{00E4}",

    "&aring;"    : "\u{00E5}",

    "&aelig;"    : "\u{00E6}",

    "&ccedil;"  : "\u{00E7}",

    "&egrave;"  : "\u{00E8}",

    "&eacute;"  : "\u{00E9}",

    "&ecirc;"    : "\u{00EA}",

    "&euml;"    : "\u{00EB}",

    "&igrave;"  : "\u{00EC}",

    "&iacute;"  : "\u{00ED}",

    "&icirc;"    : "\u{00EE}",

    "&iuml;"    : "\u{00EF}",

    "&eth;"      : "\u{00F0}",

    "&ntilde;"  : "\u{00F1}",

    "&ograve;"  : "\u{00F2}",

    "&oacute;"  : "\u{00F3}",

    "&ocirc;"    : "\u{00F4}",

    "&otilde;"  : "\u{00F5}",

    "&ouml;"    : "\u{00F6}",

    "&divide;"  : "\u{00F7}",

    "&oslash;"  : "\u{00F8}",

    "&ugrave;"  : "\u{00F9}",

    "&uacute;"  : "\u{00FA}",

    "&ucirc;"    : "\u{00FB}",

    "&uuml;"    : "\u{00FC}",

    "&yacute;"  : "\u{00FD}",

    "&thorn;"    : "\u{00FE}",

    "&yuml;"    : "\u{00FF}",

    "&OElig;"    : "\u{0152}",

    "&oelig;"    : "\u{0153}",

    "&Scaron;"  : "\u{0160}",

    "&scaron;"  : "\u{0161}",

    "&Yuml;"    : "\u{0178}",

    "&fnof;"    : "\u{0192}",

    "&circ;"    : "\u{02C6}",

    "&tilde;"    : "\u{02DC}",

    "&Alpha;"    : "\u{0391}",

    "&Beta;"    : "\u{0392}",

    "&Gamma;"    : "\u{0393}",

    "&Delta;"    : "\u{0394}",

    "&Epsilon;"  : "\u{0395}",

    "&Zeta;"    : "\u{0396}",

    "&Eta;"      : "\u{0397}",

    "&Theta;"    : "\u{0398}",

    "&Iota;"    : "\u{0399}",

    "&Kappa;"    : "\u{039A}",

    "&Lambda;"  : "\u{039B}",

    "&Mu;"      : "\u{039C}",

    "&Nu;"      : "\u{039D}",

    "&Xi;"      : "\u{039E}",

    "&Omicron;"  : "\u{039F}",

    "&Pi;"      : "\u{03A0}",

    "&Rho;"      : "\u{03A1}",

    "&Sigma;"    : "\u{03A3}",

    "&Tau;"      : "\u{03A4}",

    "&Upsilon;"  : "\u{03A5}",

    "&Phi;"      : "\u{03A6}",

    "&Chi;"      : "\u{03A7}",

    "&Psi;"      : "\u{03A8}",

    "&Omega;"    : "\u{03A9}",

    "&alpha;"    : "\u{03B1}",

    "&beta;"    : "\u{03B2}",

    "&gamma;"    : "\u{03B3}",

    "&delta;"    : "\u{03B4}",

    "&epsilon;"  : "\u{03B5}",

    "&zeta;"    : "\u{03B6}",

    "&eta;"      : "\u{03B7}",

    "&theta;"    : "\u{03B8}",

    "&iota;"    : "\u{03B9}",

    "&kappa;"    : "\u{03BA}",

    "&lambda;"  : "\u{03BB}",

    "&mu;"      : "\u{03BC}",

    "&nu;"      : "\u{03BD}",

    "&xi;"      : "\u{03BE}",

    "&omicron;"  : "\u{03BF}",

    "&pi;"      : "\u{03C0}",

    "&rho;"      : "\u{03C1}",

    "&sigmaf;"  : "\u{03C2}",

    "&sigma;"    : "\u{03C3}",

    "&tau;"      : "\u{03C4}",

    "&upsilon;"  : "\u{03C5}",

    "&phi;"      : "\u{03C6}",

    "&chi;"      : "\u{03C7}",

    "&psi;"      : "\u{03C8}",

    "&omega;"    : "\u{03C9}",

    "&thetasym;" : "\u{03D1}",

    "&upsih;"    : "\u{03D2}",

    "&piv;"      : "\u{03D6}",

    "&ensp;"    : "\u{2002}",

    "&emsp;"    : "\u{2003}",

    "&thinsp;"  : "\u{2009}",

    "&zwnj;"    : "\u{200C}",

    "&zwj;"      : "\u{200D}",

    "&lrm;"      : "\u{200E}",

    "&rlm;"      : "\u{200F}",

    "&ndash;"    : "\u{2013}",

    "&mdash;"    : "\u{2014}",

    "&lsquo;"    : "\u{2018}",

    "&rsquo;"    : "\u{2019}",

    "&sbquo;"    : "\u{201A}",

    "&ldquo;"    : "\u{201C}",

    "&rdquo;"    : "\u{201D}",

    "&bdquo;"    : "\u{201E}",

    "&dagger;"  : "\u{2020}",

    "&Dagger;"  : "\u{2021}",

    "&bull;"    : "\u{2022}",

    "&hellip;"  : "\u{2026}",

    "&permil;"  : "\u{2030}",

    "&prime;"    : "\u{2032}",

    "&Prime;"    : "\u{2033}",

    "&lsaquo;"  : "\u{2039}",

    "&rsaquo;"  : "\u{203A}",

    "&oline;"    : "\u{203E}",

    "&frasl;"    : "\u{2044}",

    "&euro;"    : "\u{20AC}",

    "&image;"    : "\u{2111}",

    "&weierp;"  : "\u{2118}",

    "&real;"    : "\u{211C}",

    "&trade;"    : "\u{2122}",

    "&alefsym;"  : "\u{2135}",

    "&larr;"    : "\u{2190}",

    "&uarr;"    : "\u{2191}",

    "&rarr;"    : "\u{2192}",

    "&darr;"    : "\u{2193}",

    "&harr;"    : "\u{2194}",

    "&crarr;"    : "\u{21B5}",

    "&lArr;"    : "\u{21D0}",

    "&uArr;"    : "\u{21D1}",

    "&rArr;"    : "\u{21D2}",

    "&dArr;"    : "\u{21D3}",

    "&hArr;"    : "\u{21D4}",

    "&forall;"  : "\u{2200}",

    "&part;"    : "\u{2202}",

    "&exist;"    : "\u{2203}",

    "&empty;"    : "\u{2205}",

    "&nabla;"    : "\u{2207}",

    "&isin;"    : "\u{2208}",

    "&notin;"    : "\u{2209}",

    "&ni;"      : "\u{220B}",

    "&prod;"    : "\u{220F}",

    "&sum;"      : "\u{2211}",

    "&minus;"    : "\u{2212}",

    "&lowast;"  : "\u{2217}",

    "&radic;"    : "\u{221A}",

    "&prop;"    : "\u{221D}",

    "&infin;"    : "\u{221E}",

    "&ang;"      : "\u{2220}",

    "&and;"      : "\u{2227}",

    "&or;"      : "\u{2228}",

    "&cap;"      : "\u{2229}",

    "&cup;"      : "\u{222A}",

    "&int;"      : "\u{222B}",

    "&there4;"  : "\u{2234}",

    "&sim;"      : "\u{223C}",

    "&cong;"    : "\u{2245}",

    "&asymp;"    : "\u{2248}",

    "&ne;"      : "\u{2260}",

    "&equiv;"    : "\u{2261}",

    "&le;"      : "\u{2264}",

    "&ge;"      : "\u{2265}",

    "&sub;"      : "\u{2282}",

    "&sup;"      : "\u{2283}",

    "&nsub;"    : "\u{2284}",

    "&sube;"    : "\u{2286}",

    "&supe;"    : "\u{2287}",

    "&oplus;"    : "\u{2295}",

    "&otimes;"  : "\u{2297}",

    "&perp;"    : "\u{22A5}",

    "&sdot;"    : "\u{22C5}",

    "&lceil;"    : "\u{2308}",

    "&rceil;"    : "\u{2309}",

    "&lfloor;"  : "\u{230A}",

    "&rfloor;"  : "\u{230B}",

    "&lang;"    : "\u{2329}",

    "&rang;"    : "\u{232A}",

    "&loz;"      : "\u{25CA}",

    "&spades;"  : "\u{2660}",

    "&clubs;"    : "\u{2663}",

    "&hearts;"  : "\u{2665}",

    "&diams;"    : "\u{2666}",

   

]

extension String {

   

    //method 1: use NSAttributedString and NSHTMLTextDocumentType to filter out html entity

    //pros: can filter out all html tags ?

    //cons:

    //1. must run in main thread otherwise crash

    //2. will remove html tags ?

//    var decodedHtml:String {

//        var decodedHtmlStr = self

//       

//        //print("decodedHtmlStr=\(decodedHtmlStr)")

//        do {

//            if let encodedData = decodedHtmlStr.dataUsingEncoding(NSUTF8StringEncoding) {

//                let attributedOptions : [String: AnyObject] = [

//                    NSDocumentTypeDocumentAttribute         : NSHTMLTextDocumentType,

//                    NSCharacterEncodingDocumentAttribute    : NSUTF8StringEncoding

//                ]

//                //print("attributedOptions=\(attributedOptions)") //["DocumentType": NSHTML, "CharacterEncoding": 4]

//                let attributedString = try NSAttributedString(data: encodedData, options: attributedOptions, documentAttributes: nil)

//                //print("attributedString=\(attributedString)")

//                /*

//                attributedString=@Anglia‍ again测试{

//                    NSColor = "UIDeviceRGBColorSpace 0 0 0 1";

//                    NSFont = "<UICTFont: 0x7e01b0a0> font-family: \"Times New Roman\"; font-weight: normal; font-style: normal; font-size: 12.00pt";

//                    NSKern = 0;

//                    NSParagraphStyle = "Alignment 4, LineSpacing 0, ParagraphSpacing 0, ParagraphSpacingBefore 0, HeadIndent 0, TailIndent 0, FirstLineHeadIndent 0, LineHeight 15/0, LineHeightMultiple 0, LineBreakMode 0, Tabs (\n), DefaultTabInterval 36, Blocks (\n), Lists (\n), BaseWritingDirection 0, HyphenationFactor 0, TighteningForTruncation NO, HeaderLevel 0";

//                    NSStrokeColor = "UIDeviceRGBColorSpace 0 0 0 1";

//                    NSStrokeWidth = 0;

//                }

//                */

//                decodedHtmlStr = attributedString.string

//                print("decodedHtmlStr=\(decodedHtmlStr)")

//            }

//        } catch {

//            print("decodedHtml error: \(error)")

//        }

//       

//        return decodedHtmlStr

//    }

   

   

    //method 2: use char entity decode

    //pros: only decode html entity -> not filter out html tags

    //cons:

   

   

    /// Returns a new string made by removing in the `String`

    /// anything enclosed in HTML brackets <>

    public var strippedHtmlTags: String {

        return stringByReplacingOccurrencesOfString("<[^>]+>", withString: "", options: .RegularExpressionSearch, range: nil);

    }

    /// Returns a new string made by replacing in the `String`

    /// all HTML character entity references with the corresponding

    /// character.

    public var decodedHtmlEntities: String {

        return decodeHTMLEntities().decodedString

    }

   

    /// Returns a tuple containing the string made by relpacing in the

    /// `String` all HTML character entity references with the corresponding

    /// character. Also returned is an array of offset information describing

    /// the location and length offsets for each replacement. This allows

    /// for the correct adjust any attributes that may be associated with

    /// with substrings within the `String`

    func decodeHTMLEntities() -> (decodedString: String, replacementOffsets: [(index: String.Index, offset: String.Index.Distance)]) {

       

        // ===== Utility functions =====

       

        // Record the index offsets of each replacement

        // This allows anyone to correctly adjust any attributes that may be

        // associated with substrings within the string

        var replacementOffsets: [(index: String.Index, offset: String.Index.Distance)] = []

       

        // Convert the number in the string to the corresponding

        // Unicode character, e.g.

        //    decodeNumeric("64", 10)   –> "@"

        //    decodeNumeric("20ac", 16) –> "€"

        func decodeNumeric(string : String, base : Int32) -> Character? {

            let code = UInt32(strtoul(string, nil, base))

            return Character(UnicodeScalar(code))

        }

       

        // Decode the HTML character entity to the corresponding

        // Unicode character, return `nil` for invalid input.

        //     decode("@")    –> "@"

        //     decode("€") –> "€"

        //     decode("&lt;")     –> "<"

        //     decode("&foo;")    –> nil

        func decode(entity : String) -> Character? {

            if entity.hasPrefix("&#x") || entity.hasPrefix("&#X"){

                return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(3)), base: 16)

            } else if entity.hasPrefix("&#") {

                return decodeNumeric(entity.substringFromIndex(entity.startIndex.advancedBy(2)), base: 10)

            } else {

                return HtmlCharacterEntitiesDict[entity]

            }

        }

       

        // ===== Method starts here =====

       

        var result = ""

        var position = startIndex

       

        // Find the next ‘&’ and copy the characters preceding it to `result`:

        while let ampRange = self.rangeOfString("&", range: position ..< endIndex) {

            //result.extend(self[position ..< ampRange.startIndex])

            result.appendContentsOf(self[position ..< ampRange.startIndex])

            position = ampRange.startIndex

           

            // Find the next ‘;’ and copy everything from ‘&’ to ‘;’ into `entity`

            if let semiRange = self.rangeOfString(";", range: position ..< endIndex) {

                let entity = self[position ..< semiRange.endIndex]

                if let decoded = decode(entity) {

                   

                    // Replace by decoded character:

                    result.append(decoded)

                    // Record offset

                    //let offset = (index: semiRange.endIndex, offset: 1 – distance(position, semiRange.endIndex))

                    let offset = (index: semiRange.endIndex, offset: 1 – position.distanceTo(semiRange.endIndex))

                    replacementOffsets.append(offset)

                   

                } else {

                   

                    // Invalid entity, copy verbatim:

                    //result.extend(entity)

                    result.appendContentsOf(entity)

                }

                position = semiRange.endIndex

            } else {

                // No matching ‘;’.

                break

            }

        }

       

        // Copy remaining characters to `result`:

        //result.extend(self[position ..< endIndex])

        result.appendContentsOf(self[position ..< endIndex])

        // Return results

        return (decodedString: result, replacementOffsets: replacementOffsets)

    }

}

即可实现html的entity的解码:

效果:

转载请注明:在路上 » [已解决]swift去除nbsp的乱码

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
85 queries in 0.199 seconds, using 22.17MB memory