티스토리 뷰

MSXML 을 사용하면서 text 오브젝트를 가져오면 항상 encodings 이 빠져서 나온다. -_-;

분명히 load 시킬때 declaration 에 인코딩이 들어있음에도 말이다.

해결하기 위해서는 결국엔 직접적으로 선언을 해줘야하는 것 같다.


ps. 그동안 이걸 넣을려고 파일로 저장한뒤에 불러오기 or 텍스트 치환등을 이용한걸 생각하면 도대체 뭔삽질은 한건가 싶다. -_-

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxml/html/xmlencodings.asp



Creating New XML Documents with MSXML

Once the XML document is loaded, you can manipulate that XML document using the DOM without concern for any encoding issues because the document is stored in memory as Unicode. All the XML DOM interfaces are based on COM BSTRs, which are 2-byte Unicode strings. This means you can build an MSXML DOM document from scratch in memory that contains all sorts of Unicode characters and all components will be able to share this DOM in memory without any confusion over the meaning of the Unicode character values. When you save this, however, MSXML will encode all data in UTF-8 by default. For example, suppose you do the following:

var xmldoc = new ActiveXObject("Microsoft.XMLDOM")
var e = xmldoc.createElement("test");
e.text = "å";
xmldoc.appendChild(e);
xmldoc.save("foo.xml");

The following UTF-8 encoded file will result:

<test>Ã¥</test>
Note   The preceding example will only work if you run the code outside the browser environment. Calling the Save method while inside the browser will not produce the same results because of security restrictions.

Even though this looks weird, it is correct. The following test loads up the UTF-8 encoded file and tests whether the UTF-8 is decoded back to the Unicode character value 229. It is:

var xmldoc = new ActiveXObject("Microsoft.XMLDOM")
xmldoc.load("foo.xml");
if (xmldoc.documentElement.text.charCodeAt(0) == 229)
{
    WScript.echo("Yippee - it worked !!");
}

To change the encoding that the XML DOM Save method uses, you need to create an XML declaration with an encoding attribute at the top of your document as follows:

var pi = xmldoc.createProcessingInstruction("xml",
                        " version='1.0' encoding='ISO-8859-1'");
xmldoc.appendChild(pi);

When you call the save method, you will then get an ISO-8859-1 encoded file as follows:

<?xml version="1.0" encoding="ISO-8859-1"?>
<test>å</test>

Now, be careful you don't let the XML property confuse you. The XML property returns a Unicode string. If you call the XML property on the DOMDocument object after creating the ISO-8859-1 encoding declaration, you will get the following Unicode string back:

<?xml version="1.0"?>
<test>å</test>

Notice that the ISO-8859-1 encoding declaration is gone. This is normal. The reason it did this is so that you can turn around and call LoadXML with this string and it will work. If it does not do this, LoadXML will fail with the error message: "Switch from current encoding to specified encoding not supported."